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Abstract 


This  report  describes  Nesl,  a  strongly- typed,  applicative,  data-parallel  language.  Nesl  is 
intended  to  be  used  as  a  portable  interface  for  programming  a  variety  of  parallel  and  vector 
supercomputers,  ana  as  a  basis  for  teaching  parallel  algorithms.  Parallelism  is  supplied 
through  a  simple  set  of  data-parallel  constructs  based  on  sequences  (ordered  sets),  including 
a  mechanism  for  applying  any  function  over  the  elements  of  a  sequence  in  parallel  and  a 
rich  set  of  parallel  functions  that  manipulate  sequences. 

Nesl  fully  supports  nested  sequences  and  nested  parallelism — the  ability  to  take  a  parallel 
function  and  apply  it  over  multiple  instances  in  parallel.  Nested  parallelism  is  important  for 
implementing  algorithms  with  complex  and  dynamically  changing  data  structures,  such  as 
required  in  many  graph  and  Sparse  matrix  algorithms.  Nesl  also  provides  a  mechanism  for 
calculating  the  asymptotic  running  time  for  a  program  on  various  parallel  machine  models, 
including  the  parallel  random  access  machine  (PRAM).  This  is  useful  for  estimating  running 
times  of  algorithms  on  actual  machines  and,  when  teaching  algorithms,  for  supplying  a  close 
correspondence  between  the  code  and  the  theoretical  complexity. 

This  report  defines  Nesl  and  describes  several  examples  of  algorithms  coded  in  the 
language.  The  examples  include  algorithms  for  median  finding,  sorting,  string  searching, 
finding  prime  numbers,  and  finding  a  planar  convex  hull.  Nesl  currently  compiles  to  an 
intermediate  language  called  Vcode,  which  runs  on  the  Cray  Y-MP,  Connection  Machine 
CM-2,  and  Encore  Multimax.  For  many  algorithms,  the  current  implementation  gives 
performance  close  to  optimized  machine-specific  code  for  these  machines. 

Note:  This  report  is  an  updated  version  of  CMU-CS-92-103,  which  described  version  2.4  of 
the  language.  The  most  significant  changes  in  version  2.6  are  that  it  supports  polymorphic 
types,  has  an  ML-Iike  syntax  instead  of  a  lisp-like  syntax,  and  includes  support  for  I/O. 
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1  Introduction 

This  report  describes  and  defines  the  data-parallel  language  Nesl.  The  language  was 
designed  with  the  following  goals: 

1.  To  support  parallelism  by  means  of  a  set  of  data-parallel  constructs  based  on  se¬ 
quences.  These  constructs  supply  parallelism  through  (1)  the  ability  to  apply  any 
function  concurrently  over  each  element  of  a  sequence,  and  (2)  a  set  of  parallel  func¬ 
tions  that  operate  on  sequences,  such  as  the  permute  function,  which  permutes  the 
order  of  the  elements  in  a  sequence. 

2.  To  support  complete  nested  parallelism.  Nesl  fully  supports  nested  sequences,  and 
the  ability  to  apply  any  user  defined  function  over  the  elements  of  a  sequence,  even 
if  the  function  is  itself  parallel  and  the  elements  of  the  sequence  are  themselves  se¬ 
quences.  Nested  parallelism  is  critical  for  describing  both  divide-and-conquer  algo¬ 
rithms  and  algorithms  with  nested  data  structures  [5]. 

3.  To  generate  efficient  code  for  a  variety  of  architectures,  including  both  SIMD  and 
MIMD  machines,  with  both  shared  and  distributed  memory.  Nesl  currently  generates 
a  portable  intermediate  code  called  Vcode  [7],  which  runs  on  the  CRAY  Y-MP,  the 
Connection  Machine  CM-2,  and  the  Encore  Multimax.  Various  benchmark  algorithms 
achieve  very  good  running  times  on  these  machines  [12,  6]. 

4.  To  be  well  suited  for  describing  parallel  algorithms,  and  to  supply  a  mechanism  for 
deriving  the  theoretical  running  time  directly  from  the  code.  Each  function  ir.  Nesl 
has  two  complexity  measures  associated  with  it,  the  work  and  step  complexities  [5].  A 
simple  equation  maps  these  complexities  to  the  asymptotic  running  time  on  a  Parallel 
Random  Access  Machine  (PRAM)  Model. 

NESL  is  a  strongly-typed  strict  first-order  functional  (applicative)  language.  It  runs 
within  an  interactive  environment  and  is  loosely  based  on  the  ML  language  [27].  The 
language  is  uses  sequences  (ordered  sets)  as  a  primitive  parallel  data  type,  and  parallelism 
is  achieved  exclusively  through  operations  on  these  sequences  [5].  The  set  of  sequence 
functions  supplied  by  Nesl  was  chosen  based  both  on  their  usefulness  on  a  broad  variety 
of  algorithms,  and  on  their  efficiency  when  implemented  on  parallel  machines.  To  promote 
the  use  of  parallelism,  Nesl  supplies  no  serial  looping  constructs  (although  serial  looping 
can  be  simulated  with  recursion),  and  supplies  no  data-structures  that  require  serial  access, 
such  as  lists  in  Lisp  or  ML. 

Nesl  is  the  first  data-parallel  language  whose  implementation  supports  nested  paral¬ 
lelism.  Nested  parallelism  is  the  ability  to  take  a  parallel  function  and  apply  it  over  multiple 
instances  in  parallel — for  example,  having  a  parallel  sorting  routine,  and  then  using  it  to 
sort  several  sequences  concurrently.  The  data-parallel  languages  C*  [31],  *Lisp  [24],  and 
Fortran  90  [1]  (with  array  extensions)  support  no  form  of  nested  parallelism.  The  parallel 
collections  in  these  languages  can  only  contain  scalars  or  fixed  sized  records.  There  is  also  no 
means  in  these  languages  to  apply  a  user  defined  function  over  each  element  of  a  collection. 
This  prohibits  the  expression  of  any  form  of  nested  parallelism.  The  languages  Connection 
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Machine  Lisp  [38],  and  Paralation  Lisp  [32]  both  supply  nested  parallel  constructs,  but 
no  implementation  ever  supported  the  parallel  execution  of  these  constructs.  Blelloch  and 
Sabot  implemented  an  experimental  compiler  that  supported  nested-parallelism  for  a  small 
subset  of  Paralation  Lisp  [9],  but  it  was  deemed  near  impossible  to  extend  it  to  the  full 
language. 

A  common  complaint  about  high-level  data-parallel  languages  and,  more  generally,  in 
the  class  of  Collection-Oriented  languages  [35],  such  as  SETL  [33]  and  APL  [22],  is  that 
it  can  be  hard  or  impossible  to  determine  approximate  running  times  by  looking  at  the 
code.  As  an  example,  the  (3  primitive  in  CM- Lisp  (a  general  communication  primitive)  is 
powerful  enough  that  seemingly  similar  pieces  of  code  could  take  very  different  amounts  of 
time  depending  on  details  of  the  implementation  of  the  operation  and  of  the  data  structures. 
A  similar  complaint  is  often  made  about  the  language  SETL — a  language  with  sets  as  a 
primitive  data  structure.  The  time  taken  by  the  set  operations  in  SETL  is  strongly  affected 
by  how  the  set  is  represented.  This  representation  is  chosen  by  the  compiler. 

For  this  reason,  Nesl  was  designed  so  that  the  built-in  functions  are  quite  simple  and 
so  that  the  asymptotic  complexity  can  be  derived  from  the  code.  To  derive  the  complexity, 
each  function  in  Nesl  has  two  complexity  measures  associated  with  it:  the  work  and  step 
complexities  [5].  The  work  complexity  represents  the  serial  work  executed  by  a  program — 
the  running  time  if  executed  on  a  serial  RAM.  The  step  complexity  represents  the  deepest 
path  taken  by  the  function — the  running  time  if  executed  with  an  unbounded  number  of 
processors.  Simple  composition  rules  can  be  used  to  combine  the  two  complexities  across 
expressions  and,  based  on  Brent’s  scheduling  principle  [10],  the  two  complexities  place 
an  upper  bound  on  the  asymptotic  running  times  for  the  parallel  random  access  machine 
(PRAM)  [16]. 

The  current  compiler  translates  Nesl  to  Vcode  [7],  a  portable  intermediate  language. 
The  compiler  uses  a  technique  called  flattening  nested  parallelism  [9]  to  translate  Nesl 
into  the  much  simpler  flat  data-parallel  model  supplied  by  Vcode.  Vcode  is  a  small 
stack-based  language  with  about  100  functions  all  of  which  operate  on  sequences  of  atomic 
values  (scalars  are  implemented  as  sequences  of  length  1).  A  Vcode  interpreter  has  been 
implemented  for  running  Vcode  on  the  Cray  Y-MP,  Connection  Machine  CM-2,  or  any 
serial  machine  with  a  C  compiler  [6].  The  sequence  functions  in  this  interpreter  have  been 
highly  optimized  [5,  14]  and,  for  large  sequences,  the  interpretive  overhead  becomes  rela¬ 
tively  small  yielding  high  efficiencies.  For  the  Encore  Multimax  Chatterjee  has  developed 
a  compiler  for  Vcode  [12,  13].  This  compiler  reduces  both  the  synchronization  needed 
among  processors  and  the  memory  traffic  over  the  shared  bus.  Most  of  the  techniques  used 
by  this  Vcode  compiler  should  be  applicable  to  any  MIMD  parallel  machine. 

The  interactive  Nesl  environment  runs  within  Common  Lisp  and  can  be  used  to  run 
Vcode  on  remote  machines.  This  allows  the  user  to  run  the  environment,  including  the 
compiler,  on  a  local  workstation  while  executing  interactive  calls  to  Nesl  programs  on  the 
CRAY  Y-MP  or  CM-2  (or  any  other  workstation,  if  so  desired).  As  in  the  Standard  ML 
of  New  Jersey  compiler  [2],  all  interactive  invocations  are  first  compiled  (in  our  case  into 
Vcode),  and  then  executed. 

Control  parallel  languages  that  have  some  feature  that  are  similar  to  NESL  include 
ID  [28,  3],  Sisal  [25],  and  Proteus  [26].  ID  and  Sisal  are  both  side-effect  free  and  supply 
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operations  on  collections  of  values. 

The  remainder  of  this  section  discusses  the  use  of  sequences  and  nested  parallelism 
in  Nesl,  and  how  complexity  can  be  derived  from  Nesl  code.  Section  2  shows  several 
examples  of  code,  and  Section  3  along  with  Appendix  A  and  Appendix  B  defines  the 
language.  Shortcomings  of  Nesl  include  the  limitation  to  first-order  functions  (there  is  no 
ability  to  pass  functions  as  arguments).  We  are  currently  working  on  a  follow-up  on  Nesl, 
which  will  be  based  on  a  more  rigorous  type  system,  and  will  include  some  support  for 
higher-order  functions. 

1.1  Parallel  Operations  on  Sequences 

Nesl  supports  parallelism  through  operations  on  sequences.  A  sequence  is  an  ordered  set 
and  is  specified  in  NESL  using  square  brackets.  For  example 

[2,  1,  9,  -3] 

is  a  sequence  of  four  integers.  In  Nesl  all  elements  of  a  sequence  must  be  of  the  same  type, 
and  all  sequences  must  be  of  finite  length.  Parallelism  on  sequences  can  be  achieved  in  two 
ways:  the  ability  to  apply  any  function  concurrently  over  each  element  of  a  sequence,  and 
a  set  of  built-in  parallel  functions  that  operate  on  sequences.  The  application  of  a  function 
over  a  sequence  is  achieved  using  set-like  notation  similar  to  set-formers  in  SETL  [33]  and 
list-comprehensions  in  Miranda  [36]  and  Haskell  [21].  For  example,  the  expression 

{negate (a)  :  a  in  C3,  -4,  -9,  5]}; 

=>  [-3,  4,  9,  -5]  :  [int] 

negates  each  elements  of  the  sequence  [3,  -4,  -9,  5].  This  construct  can  be  read  as  “in 
parallel  for  each  a  in  the  sequence  {3,  .-4,  -9,  5},  negate  a”.  The  symbol  =>  points  to 
the  result  of  the  expression,  and  the  expression  [int]  specifies  the  type  of  the  result:  a 
sequence  of  integers.  The  semantics  of  the  notation  differs  from  that  of  SETL,  Miranda  or 
Haskell  in  that  the  operation  is  defined  to  be  applied  in  parallel.  Henceforth  we  will  refer  to 
the  notation  as  the  apply-to-each  construct.  As  with  set  comprehensions,  the  apply-to-each 
construct  also  provides  the  ability  to  subselect  elements  of  a  sequence:  the  expression 

{negate (a)  :  a  in  [3,  -4,  -9,  5]  I  a  <  4}; 

=>  [-3,  4,  9]  :  [int] 

can  be  read  as,  “in  parallel  for  each  a  in  the  sequence  {3,  4,  9,  1}  such  that  a  is  less 
than  4,  negate  a”.  The  elements  that  remain  maintain  their  order  relative  to  each  other. 
It  is  also  possible  to  iterate  over  multiple  sequences.  The  expression 

{a  ♦  b  :  a  in  [3,  -4,  -9,  5];  b  in  [1,  2,  3,  4]}; 

=>  [4,  -2,  -6,  9]  :  [int] 

elementwise  adds  the  two  sequences.  A  full  description  of  the  apply-to-each  construct  is 
given  in  Section  3.2. 

In  Nesl,  any  function,  whether  primitive  or  user  defined,  can  be  applied  to  each  element 
of  a  sequence.  So,  for  example,  we  could  define  a  factorial  function 
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Operation 

Description 

Work 

*  diat(a,l) 

Distribute  value  a  to  sequence  of  length  1. 

S(result) 

*  fa 

Return  length  of  sequence  a. 

1 

aCi] 

Return  element  at  position  i  of  a. 

S(result) 

rep(v,d,i) 

Replace  element  at  position  i  o/d  with  v. 

S(v),S(d) 

[s:e] 

Return  integer  sequence  from  s  to  e. 

(e  -  s) 

[s :  a :  d] 

Return  integer  sequence  from  a  to  a  by  d. 

(e  -  s)/d 

sum(a) 

Return  sum  of  sequence  a. 

S(a) 

*  ©jcan(a) 

Return  scan  based  on  operator  ©. 

S(a) 

count (a) 

Count  number  of  true  flags  in  a. 

S(a) 

permute(a,i) 

Permute  elements  of  a.  to  positions  i. 

S(a) 

*  d  <-  a 

Place  elements  a  in  d. 

S(a),  S(d) 

*  a  ->  i 

Get  values  from  sequence  a  based  on  indices  i. 

S(i) 

packCa.f) 

Pack  sequence  a  based  on  flags  t. 

S(a) 

max-index  (a) 

Return  index  of  the  maximum  value. 

S(a) 

minJLndex(a) 

Return  index  of  the  minimum  value. 

S(  a) 

a  ++  b 

Append  sequences  a  and  b. 

S(a)  +  S(b) 

drop(a,n) 

Drop  first  n  elements  of  sequence  a. 

S(result) 

take(a.n) 

Take  first  n  elements  of  sequence  a. 

S(result) 

rotate(a.n) 

Rotate  sequence  a  by  n  positions. 

S(a) 

*  flatten(a) 

Flatten  nested  sequence  a. 

S(a) 

*  partition(a.l) 

Partition  sequence  a  info  nested  sequence. 

S(a) 

split (a, f) 

Split  a  into  nested  sequence  based  on  flags  t. 

S(a) 

bottop(a) 

Split  a  info  nested  sequence. 

S(a) 

Figure  1:  List  of  some  of  the  sequence  functions  supplied  by  Nesl.  In  the  work  column,  S(v) 
refers  to  the  size  of  the  object  v.  The  *  before  certain  functions  means  that  those  functions  are 
primitives.  All  the  other  functions  can  be  built  out  of  the  primitives  with  at  most  a  constant 
factor  overhead  in  both  work  and  number  of  steps.  For  ®_scan  the  ®  can  be  one  of  {plus, 
max,  min,  or,  and}.  All  the  sequence  functions  are  described  in  detail  in  Appendix  B.2.  In  rep 
and  <-,  the  work  complexity  depends  on  whether  the  variable  used  for  d  is  the  final  reference 
to  that  variable  (arguments  are  evaluated  left  to  right).  If  it  is  the  final  reference,  then  the 
complexity  before  the  comma  is  used,  otherwise  the  complexity  after  the  comma  is  used. 
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function  factorial (i)  * 
if  (i  ■*  1)  than  1 
alsa  i*f actor ial(i-l) ; 

=>  factorial  :  int  ->  int 
and  then  apply  it  over  the  elements  of  a  sequence 
{factorial (x)  :  x  in  [3,1,7]}; 

=►  [6,1,5040]  :  [int] 

In  this  example,  the  function  name  (arguments)  *  body;  construct  is  used  to  define 
factorial.  The  function  is  of  type  int  ->  int,  indicating  a  function  that  maps  inte¬ 
gers  to  integers.  The  type  is  inferred  by  the  compiler. 

An  apply-to-each  construct  applies  a  body  to  each  element  of  a  sequence.  We  will  call 
each  such  application  an  instance.  Since  there  are  no  side  effects  in  Nesl1,  there  is  no  way 
to  communicate  among  the  instances  of  an  apply-to-each.  An  implementation  can  therefore 
execute  the  instances  in  any  order  it  chooses  without  changing  the  result.  In  particular, 
the  instances  can  be  implemented  in  parallel,  therefore  giving  the  apply-to-each  its  parallel 
semantics. 

In  addition  to  the  apply-to-each  construct,  a  second  way  to  take  advantage  of  parallelism 
in  Nesl  is  through  a  set  of  sequence  functions.  The  sequence  functions  operate  on  whole 
sequences  and  all  have  relatively  simple  parallel  implementations.  For  example  the  function 
sum  sums  the  elements  of  a  sequence. 

sum( [2 ,  1,  -3,  11,  5]); 

=>  16  ;  int 

Since  addition  is  associative,  this  can  be  implemented  on  a  parallel  machine  in  logarithmic 
time  using  a  tree.  Another  common  sequence  function  is  the  permute  function,  which 
permutes  a  sequence  based  on  a  second  sequence  of  indices.  For  example: 

permute ( "nesl " , [2 , 1 , 3 , 0] ) ; 

=»  "lens"  :  [char] 

In  this  case,  the  4  characters  of  the  string  "nesl"  (the  term  string  is  used  to  refer  to  a 
sequence  of  characters)  are  permuted  to  the  indices  [2,  1,  3,  0]  (n  — ♦  2,  e  — ►  1,  s  -*  3, 
and  1  -♦  0).  The  implementation  of  the  permute  function  on  a  distributed-memory  parallel 
machine  could  use  its  communication  network  and  the  implementation  on  a  shared-memory 
machine  could  use  an  indirect  write  into  the  memory. 

Table  1  lists  some  of  the  sequence  functions  available  in  Nesl.  A  subset  of  the  functions 
(the  starred  ones)  form  a  complete  set  of  primitives.  These  primitives,  along  with  the 
scalar  operations  and  the  apply-to-each  construct,  are  sufficient  for  implementing  the  other 
functions  in  the  table  with  at  most  a  constant  factor  increase  in  both  the  step  and  work 
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function  kth_s— Uost(s,  k)  * 
lot  pivot  »  ■ [#s/2] ; 

losaor  >  {«  inil  i  <  pivot} 
in  if  (k  <  Slssssr)  than  ktlusallest  (lesser,  k) 

olaa 

lot  greeter  *  {o  in  a i  o  >  pivot} 
in  if  (k  >*  fa  -  fgraator)  than 

kth-snellest (greeter,  k  -  (fa  -  fgraator)) 
olao  pivot; 

Figure  2:  An  implementation  of  order  statistics.  The  function  kth-smallaat  returns  the  kth 
smallest  element  from  the  input  sequence  s. 

complexities,  as  defined  in  Section  1.5.  The  table  also  lists  the  work  complexity  of  each 
function,  which  will  also  be  defined  in  Section  1.5. 

We  now  consider  an  example  of  the  use  of  sequences  in  Nesl.  The  algorithm  we  consider 
solves  the  problem  of  finding  the  k<A  smallest  element  in  a  set  s,  using  a  parallel  version 
of  the  quickorder  algorithm  [19] .  Quickorder  is  similar  to  quicksort,  but  only  calls  itself 
recursively  on  either  the  elements  lesser  or  greater  than  the  pivot.  The  Nesl  code  for 
the  algorithm  is  shown  in  Figure  2.  The  let  construct  is  used  to  bind  local  variables  (see 
Section  3.2.2  for  more  details.).  The  code  first  binds  len  to  the  length  of  the  input  sequence 
8,  and  then  extracts  the  middle  element  of  s  as  a  pivot.  The  algorithm  then  selects  all  the 
elements  less  than  the  pivot,  and  places  them  in  a  sequence  that  is  bound  to  lesser.  For 
example: 

s  «  [4,  8,  2,  3,  1,  7,  2] 

pivot  ■  3 

{x  in  s  I  s  <  pivot}  ■  [2,  1,  2] 

After  the  pack,  if  the  number  of  elements  in  the  set  lesser  is  greater  than  k,  then  the 
kth  smallest  element  must  belong  to  that  set.  In  tnis  case,  the  algorithm  calls  kth_smallest 
recursively  on  lesser  using  the  same  k.  Otherwise,  the  algorithm  selects  the  elements  that 
are  greater  than  the  pivot,  again  using  pack,  and  can  similarly  find  if  the  ktA  element  belongs 
in  the  set  greater.  If  it  does  belong  in  greater,  the  algorithm  calls  itself  recursively,  but 
must  now  readjust  k  by  subtracting  off  the  number  of  elements  lesser  and  equal  to  the 
pivot.  If  the  ktA  element  belongs  in  neither  lesser  nor  greater,  then  it  must  be  the  pivot, 
and  the  algorithm  returns  this  value. 

1.2  Nested  Parallelism 

In  Nesl  the  elements  of  a  sequence  can  be  any  valid  data  item,  including  sequences.  This 
rule  permits  the  nesting  of  sequences  to  am  arbitrary  depth.  A  nested  sequence  can  be 
written  as 

'This  k  not  strictly  true  since  some  of  the  utility  functions,  such  u  reading  or  writing  from  a  file,  have 
side  effects.  These  functions,  however,  cannot  be  used  within  an  apply- Uveach  construct. 
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Application 

Outer  Parallelism 

Inner  Parallelism 

Sum  of  Neighbors  in  Graph 

For  each  vertex 
of  graph 

Sum  neighbors 
of  vertex 

Figure  Drawing 

For  each  line 
of  image 

Draw  pixels 
of  line 

Compiling 

For  each  procedure 
of  program 

Compile  code 
of  procedure 

Text  Formatting 

For  each  paragraph 
of  document 

Justify  lines 
of  paragraph 

Table  1:  Routines  with  nested  parallelism.  Both  the  inner  part  and  the  outer  part  can  he 
executed  in  parallel. 


CC2,  1],  [7.3.0],  [4]] 

This  sequence  has  type:  [[int]]  (a  sequence  of  sequences  of  integers).  Given  nested 
sequences  and  the  rule  that  any  function  can  be  applied  in  parallel  over  the  elements  of  a 
sequence,  Nesl  necessarily  supplies  the  ability  to  apply  a  parallel  function  multiple  times 
in  parallel;  we  call  this  ability  nested  parallelism.  For  example,  we  could  apply  the  parallel 
sequence  function  sum  over  a  nested  sequence: 

(sua(v)  :  ▼  in  [[2,  1],  [7,3,0],  [4]]}; 

=»  [3,  10,  4]  :  [int] 

In  this  expression  there  is  parallelism  both  within  each  sunt,  since  the  sequence  function  has 
a  parallel  implementation,  and  across  the  three  instances  of  sum,  since  the  apply- to-each 
construct  is  defined  such  that  all  instances  can  run  in  parallel. 

Nesl  supplies  a  handful  of  functions  for  moving  between  levels  of  nesting.  These  include 
flatten,  which  takes  a  nested  sequence  and  flattens  it  by  one  level.  For  example, 

flattan([[2,  1],  [7,  3,  0],  [4]]); 

=►  [2,  1,  7,  3,  0,  4]  :  [int] 

Another  useful  function  is  bottop  (for  bottom  and  top),  which  takes  a  sequence  of  values 
and  creates  a  nested  sequence  of  length  2  w>th  all  the  elements  from  the  bottom  half  of  the 
input  sequence  in  the  first  element  and  elements  from  the  top  half  in  the  second  element  (if 
the  length  of  the  sequence  is  odd,  the  bottom  part  gets  the  extra  element).  For  example, 

bottop ("nested  parallelism"); 

=*•  ["nested  pa",  "ralellism"]  :  [[char]] 
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Algorithm 

Outer  Parallelism 

Inner  Parallelism 

Quicksort 

For  lesser  and  greater 
elements 

Quicksort 

Mergesort 

For  first  and  second 
half 

Mergesort 

Closest  Pair 

For  each  half  of 
space 

Closest  Pair 

Strassen’s 

For  each  of  the  7 

Strassen’s 

Matrix  Multiply 

sub  multiplications 

Matrix  Multiply 

Fast 

For  two  sets  of 

Fast 

Fourier  Transform 

interleaved  points 

Fourier  Transform 

Table  2:  Some  divide  and  conquer  algorithms. 


Table  1  lists  several  examples  of  routines  that  could  take  advantage  of  nested  parallelism. 
Nested  parallelism  also  appears  in  most  divide-and-conquer  algorithms.  A  divide-and- 
conquer  algorithm  breaks  the  original  data  into  smaller  parts,  applies  the  same  algorithm 
on  the  subparts,  and  then  merges  the  results.  If  the  subproblems  can  be  executed  in  parallel, 
as  is  often  the  case,  the  application  of  the  subparts  involves  nested  parallelism.  Table  2 
lists  several  examples. 

As  an  example,  consider  how  the  function  sub  might  be  implemented, 

function  my_aua(&)  * 

if  (fa  *»  1)  then  a TO] 

else 

let  r  »  {my_sum(v)  :  v  in  bottop(a)}; 
in  r[0]  ♦  r[l]  ; 

This  code  tests  if  the  length  of  the  input  is  one,  and  returns  the  single  element  if  it  is.  If 
the  length  is  not  one,  it  uses  bottop  to  split  the  sequence  in  two  parts,  and  then  applies 
itself  recursively  to  each  part  in  parallel.  When  the  parallel  calls  return,  the  two  results  are 
extracted  and  added.3  The  code  effectively  creates  a  tree  of  parallel  calls  which  has  depth 
lgn,  where  n  is  the  length  of  a,  and  executes  a  total  of  n  -  1  calls  to  ♦. 

As  another  more  involved  example,  consider  a  parallel  variation  of  quicksort  [4]  (see 
Figure  3).  When  applied  to  a  sequence  s,  this  version  splits  the  values  into  three  subsets  (the 
elements  lesser,  equal  and  greater  than  the  pivot)  and  calls  itself  recursively  on  the  lesser 
and  greater  subsets.  To  execute  the  two  recursive  calls,  the  lesser  and  greater  sequences 
are  concatenated  into  a  nested  sequence  and  qsort  is  applied  over  the  two  elements  of  the 

aTo  simulate  the  built-in  sun,  it  would  be  necessary  to  add  code  to  return  the  identity  (0)  for  empty 
sequences. 
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Inaction  qsort(a)  » 
if  (#*  <  2)  than  a 
•In 

let  pivot  *  *[#a/2] ; 

lssssr  »  {•  in  a|  •  <  pivot}; 

•qual  *  {«  in  al  •  **  pivot}; 

grantor  >  {a  in  a|  a  >  pivot}; 
raanlt  *  {qsort(v):  v  in  Claasar, grantor]} 
in  raanlt CO]  ++  aqnal  ♦+  raanlt Cl]; 

Figure  3:  An  implementation  of  quicksort. 


Figure  4:  The  quicksort  algorithm.  Just  using  parallelism  within  each  block  yields  a  parallel 
running  time  at  least  as  great  as  the  number  of  blocks  (O(n)).  Just  using  parallelism  from 
running  the  blocks  in  parallel  yields  a  parallel  running  time  at  least  as  great  as  the  largest  block 
(O(n)).  By  using  both  forms  of  parallelism  the  parallel  running  time  can  be  reduced  to  the 
depth  of  the  tree  (expected  0(lg  n)). 

nested  sequences  in  parallel.  The  final  line  extracts  the  two  results  of  the  recursive  calls 
and  appends  them  together  with  the  equal  elements  in  the  correct  order. 

The  recursive  invocation  of  qsort  generates  a  tree  of  calls  that  looks  something  like  the 
tree  shown  in  Figure  4.  In  this  diagram,  taking  advantage  of  parallelism  within  each  block 
as  well  as  across  the  blocks  is  critical  to  getting  a  fast  parallel  algorithm.  If  we  were  only 
to  take  advantage  of  the  parallelism  within  each  quicksort  to  subselect  the  two  sets  (the 
parallelism  within  each  block),  we  would  do  well  near  the  root  and  badly  near  the  leaves 
(there  are  n  leaves  which  would  be  processed  serially).  Conversely,  if  we  were  only  to  take 
advantage  of  the  parallelism  available  by  running  the  invocations  of  quicksort  in  parallel 
(the  parallelism  between  blocks,  but  not  within  a  block),  we  would  do  well  at  the  leaves 
and  badly  at  the  root  (it  would  take  n  time  to  process  the  root).  In  both  cases  the  parallel 
time  complexity  is  0(n)  rather  than  the  ideal  0(lg3  n)  we  can  get  using  both  forms  (this 
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is  discussed  in  Section  1.5). 


1.3  Pairs 

As  well  as  sequences,  Nesl  supports  the  notion  of  pairs.  A  pair  is  a  structure  with  two 
elements,  each  of  which  can  be  of  any  type.  Pairs  are  often  used  to  build  simple  structures 
or  to  return  multiple  values  from  a  function.  The  binary  comma  operator  is  used  to  create 
pairs.  For  example: 

9.8,"foo"; 

=»  (9.8,"foo")  :  (lloat,  [char]) 

2,3; 

=>  (2,3)  :  (int,  int) 

The  comma  operator  is  right  associative  (e.g.  (2, 3,4,5)  is  equivalent  to  (2,  (3,  (4,5)))). 
All  other  binary  operators  in  NESL  are  left  associative..  The  precedence  of  the  comma 
operator  is  lower  than  any  other  binary  operator,  so  it  is  usually  necessary  to  put  pairs 
within  parentheses. 

Pattern  matching  inside  of  a  let  construct  can  be  used  to  deconstruct  structures  of 
pairs.  For  example: 

let  (a,b,c)  «  (2*4, 5-2, 4) 
in  a+b*c; 

=>  20  :  int 

In  this  example,  a  is  bound  to  8,  b  is  bound  to  3,  and  c  is  bound  to  4. 

Nested  pairs  differ  from  sequences  in  several  important  ways.  Most  importantly,  there 
is  no  way  to  operate  over  the  elements  of  a  nested  pair  in  parallel.  A  second  important 
difference  is  that  the  elements  of  a  pair  need  not  be  of  the  same  type,  while  elements  of  a 
sequence  must  always  be  of  the  same  type. 

1.4  Types 

Nesl  is  a  strongly  typed  polymorphic  language  with  a  type  inference  system.  Its  type 
system  is  similar  to  functional  languages  such  as  ML,  but  since  it  is  first-order  (functions 
cannot  be  passed  as  data),  function  types  only  appear  at  the  top  level.  Type  variables  of 
polymorphic  functions  can  therefore  range  over  all  the  data-types.  As  well  as  parametric 
polymorphism  Nesl  also  allows  a  form  of  overloading  similar  to  what  is  supplied  by  the 
Haskell  Language  [21]. 

The  type  of  a  polymorphic  function  in  Nesl  is  specified  by  using  type-variables,  which 
are  declared  in  a  type-context.  For  example,  the  type  of  the  permute  function  is: 

[A] ,  [int]  ->  [A]  :  A  in  any 
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any 

/  I  \ 

ordinal  I  ALL  OTHER  DATA  TYPES 

/  \  \ 

/  number  logical 

/  /  \  /  I 

CHAR  FLOAT  1ST  BOOL 


Figure  5:  The  type-class  hierarchy  of  Nesl.  The  lower  case  names  are  the  type  classes. 

This  specifies  that  for  A  bound  to  any  type,  permute  maps  a  sequence  of  type  [A]  and  a 
sequence  of  type  [int]  into  another  sequence  of  type  [A] .  The  variable  A  is  a  type- variable, 
and  the  specification  A  in  any  is  the  context.  A  context  can  have  multiple  type  bindings 
separated  by  semicolons.  For  example,  the  pair  function  described  in  the  last  section  has 
type: 


A,  B  ->  (A,B)  ::  A  in  any:  B  in  any 
User  defined  functions  can  also  be  polymorphic.  For  example  we  could  define 

function  append3(al ,s2,s3)  ■  si  ♦♦  s2  ♦♦  s3; 

=>  append3(sl,s2,s3)  :  [A],  [A],  [A]  ->  [A]  ::  A  in  any 

The  type  inference  system  will  always  determine  the  most  general  type  possible. 

In  addition  to  parametric  polymorphism,  Nesl  supports  a  form  of  overloading  by  in¬ 
cluding  the  notion  of  type-clashes.  A  type-class  is  a  set  of  types  along  with  an  associated 
set  of  functions.  The  functions  of  a  class  can  only  be  applied  to  the  types  from  that  class. 
For  example  the  base  types,  int  and  float  are  both  members  of  the  type  class  number, 
and  numerical  functions  such  as  +  and  *  are  defined  to  work  on  all  numbers.  The  type  of 
a  function  overloaded  in  this  way,  is  specified  by  limiting  the  context  of  a  type-variable  to 
a  specific  type-class.  For  example,  the  type  of  +  is: 

A,  A  ->  A  ::  A  in  number 

The  context  “A  in  number”  specifies  that  A  can  be  bound  to  any  member  of  the  type- 
class  number.  The  fully  polymorphic  specification  any  can  be  thought  of  as  type-class  that 
contains  all  data  types  as  members.  The  type-classes  are  organized  into  the  hierarchy  as 
shown  in  Figure  5.  Functions  such  as  ■  and  <  are  defined  on  ordinals,  functions  such  as  + 
and  *  are  defined  on  numbers,  and  functions  such  ass  or  and  not  are  defined  on  logicals. 
User-defined  functions  can  also  be  overloaded.  For  example: 

function  double (a)  ■  a  ♦  a; 

=>  double(a)  :  A  ->  A  :  A  in  number 
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It  is  also  possible  to  restrict  the  type  of  a  user-defined  function  by  explicitly  typing  it.  For 
example, 

function  double(a)  :  int  ->  int  ■  a  ♦  a; 

=>  double  (a)  :  int  ->  int 

limits  the  type  of  square  to  int  ->  int.  The  :  specifies  that  the  next  form  is  a  type- 
specifier  (see  Appendix  A  for  the  full  syntax  of  the  function  construct  and  type  specifiers). 

In  certain  situations  the  type  inference  system  cannot  determine  the  type  even  though 
there  is  one.  For  example  the  function: 

function  badfunc(a.b)  ■  a  or  (a  +  b); 

will  not  type  properly  because  or  is  defined  on  the  type-class  logical  and  +  is  defined  on 
the  type-class  number.  As  it  so  happens,  int  is  both  a  logical  and  am  integer,  but  the  Nesl 
inference  system  does  not  know  how  to  take  intersections  of  type-classes.  In  this  situation 
it  is  necessary  to  specify  the  type: 

function  goodfunc(a.b)  :  int,  int  ->  int  *  a  or  (a  ♦  b) ; 

=>  goodfunc(a.b)  :  int,  int  ->  int 
This  situation  comes  up  quite  rarely. 

Specifying  the  type  using  serves  as  good  documentation  for  a  function  even  when 
the  inference  system  can  determine  the  type.  The  notion  of  type- classes  in  Nesl  is  similar 
to  the  type-classes  used  in  the  Haskell  language  [21],  but,  unlike  Haskell,  Nesl  currently 
does  not  permit  the  user  to  add  new  type  classes.3 

1.5  Deriving  Complexity 

There  are  two  complexities  associated  with  all  computations  in  Nesl. 

1.  Work  complexity:  this  represents  the  total  work  done  by  the  computation,  that  is 
to  say,  the  amount  of  time  that  the  computation  would  take  if  executed  on  a  serial 
random  access  machine.  The  work  complexity  for  most  of  the  sequence  functions  is 
simply  the  size  of  one  of  its  arguments.  A  complete  list  is  given  in  Table  1.  The  size 
of  an  object  is  defined  recursively:  the  size  of  a  scalar  value  is  1,  and  the  size  of  a 
sequence  is  the  sum  of  the  sizes  of  its  elements  plus  1. 

2.  Step  complexity:  this  represents  the  parallel  depth  of  the  computation,  that  is  to 
say,  the  amount  of  time  the  computation  would  take  on  a  machine  with  an  unbounded 
number  of  processors.  The  step  complexity  of  all  the  sequence  functions  supplied  by 
Nesl  is  constant. 

The  work  and  step  complexities  are  based  on  the  vector  random  access  machine  (VRAM) 
model  [5],  a  strictly  data-parallel  abstraction  of  the  parallel  random  access  machine  (PRAM) 
model  [16].  Since  the  complexities  are  meant  for  determining  asymptotic  complexity,  these 

*It  is  likely  that  future  versions  of  Nesl  will  allow  such  extensions. 
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complexities  do  not  include  constant  factors.  All  the  Nesl  functions,  however,  can  be 
executed  in  a  small  number  of  machine  instructions  per  element. 

The  complexities  are  combined  using  simple  combining  rules.  Expressions  are  combined 
in  the  standard  way — for  both  the  work  complexity  and  the  step  complexity,  the  complexity 
of  an  expression  is  the  sum  of  the  complexities  of  the  arguments  plus  the  complexity  of  the 
call  itself.  For  example,  the  complexities  of  the  computation: 

sun(dist(7,n))  *  #a 
can  be  calculated  as: 

Work  Step 

diet  n  1 

sun  n  1 

#  (length)  1  1 

*  1  1 

Total  O(n)  0(1) 

The  apply-to*each  construct  is  combined  in  the  following  way.  The  work  complexity 
is  the  sum  of  the  work  complexity  of  the  instantiations,  and  the  step  complexity  is  the 
maximum  over  the  step  complexities  of  the  instantiations.  If  we  denote  the  work  required 
by  an  expression  exp  applied  to  some  data  a  as  W(exp(a)),  and  the  steps  required  as 
S(exp(a)),  these  combining  rules  can  be  written  as 

W({el(a)  :  a  in  e2(b)})  =  W(e2(b))  +  sum({W(el(a)) :  a  in  e2(b)})  (1) 

5({el(a):a  in  e2(b)})  =  5(a2(b))  +  max_val({5(el(a))  :  a  in  e2(b)})  (2) 

where  sun  and  nax.val  just  take  the  sum  and  maximum  of  a  sequence,  respectively. 

As  an  example,  the  complexities  of  the  computation: 

{[0:i]  :  i  in  [0:n]} 
can  be  calculated  as: 

Work  Step 
[0:n]  n  1 

Parallel  Calls 

CO:il  £&*  maxggl 

Ibui  Ofr7)  0(1) 

Once  the  work  (W)  and  step  (5)  complexities  have  been  calculated  in  this  way,  the 
formula 


T  =  0(W/P  +  SlgP) 


(3) 


places  an  upper  bound  on  the  asymptotic  running  time  of  an  algorithm  on  the  CRCW 
PRAM  model  (P  is  the  number  of  processors).  This  formula  can  be  derived  from  Brent’s 
scheduling  principle  [10]  as  shown  in  [34,  5,  23].  The  lg  P  term  shows  up  because  of  the 
cost  of  allocating  tasks  to  processors,  and  the  cost  of  implementing  the  suo  and  scan 
operations.  On  the  scan-PRAM  [4],  where  it  is  assumed  that  the  scan  operations  are  no 
more  expensive  than  references  to  the  shared-memory  (they  both  require  O(lgP)  on  a 
machine  with  bounded  degree  circuits),  then  the  equation  is: 

T  =  0{W/P  +  S )  (4) 

In  the  mapping  onto  a  PRAM,  the  only  reason  a  concurrent- write  capability  is  required  is 
for  implementing  the  <-  (put)  function,  and  the  only  reason  a  concurrent-read  capability  is 
required  is  for  implementing  the  ->  (get)  function.  Both  of  these  functions  allow  repeated 
indices  (“collisions”)  and  could  therefore  require  concurrent  access  to  a  memory  location. 
If  an  algorithm  does  not  use  these  functions,  or  guarantees  that  there  are  no  collisions 
when  they  are  used,  then  the  mapping  can  be  implemented  with  a  EREW  PRAM.  Out  of 
the  algorithms  in  this  paper,  the  primes  algorithm  (Section  2.2)  requires  concurrent  writes, 
and  the  string-searching  algorithm  (Section  2.1)  requires  concurrent  reads.  All  the  other 
algorithms  can  use  an  EREW  PRAM. 

As  an  example  of  how  the  work  and  step  complexities  can  be  used,  consider  the 
kth-snallest  algorithm  described  earlier  (Figure  2).  In  this  algorithm  the  work  is  the 
same  as  the  time  required  by  the  standard  serial  version  (loops  have  been  replaced  by  par¬ 
allel  calls),  which  has  an  expected  time  of  0(n)  [15].  It  is  also  not  hard  to  show  that  the 
expected  number  of  recursive  calls  is  O(lgn),  since  we  expect  to  drop  some  fraction  of  the 
elements  on  each  recursive  call  [30].  Since  each  recursive  call  requires  a  constant  number 
of  steps,  we  therefore  have: 

W(n)  =  0(n)  S(n)  =  0(lgn) 

Using  Equation  3  this  gives  us  an  expected  case  running  time  on  a  PRAM  of: 

T(n)  =  0(n/p  +  lgnlgp)  =  0(n/p  +  lg2  n)  EREW  PRAM 
=  0(n/p  +  lgn)  scan-PRAM 

One  can  similarly  show  for  the  quicksort  algorithm  given  in  Figure  3  that  the  work  and  step 
complexities  are  W(n)  =  O(nlgn)  and  5(n)  =  O(lgn)  [30],  which  give  a  EREW  PRAM 
running  time  of: 


T(n)  =  0(nlgn/p  -|-  lg2  n)  EREW  PRAM 
=  0(n  lg  n/p  +  lg  n)  scan-PRAM 

In  the  remainder  of  this  paper  we  will  only  derive  the  work  and  step  complexities.  The 
reader  can  plug  these  into  Equation  3  or  Equation  4  to  get  the  PRAM  running  times. 
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2  Examples 

This  section  describes  several  examples  of  Nesl  programs.  Before  describing  the  examples 
we  describe  three  common  operations,  get,  put  and  integer  sequences.  The  ->  binary  oper¬ 
ator  (called  get)  is  used  to  extract  multiple  elements  from  a  sequence.  Its  left  argument  is 
the  sequence  to  extract  from,  and  the  right  argument  is  a  sequence  of  integer  indices  which 
specify  from  which  locations  to  extract  elements.  For  example,  the  expression 

"an  example"->[7,  0,  8,  4]  ; 

=>  "pale"  :  [char] 

extracts  the  p ,  a,  1  and  e  from  locations  7, 0, 8  and  4,  respectively.  The  <-  binary  operator 
(called  put)  is  used  to  insert  multiple  elements  into  a  sequence.  Its  left  argument  is  the 
sequence  to  insert  into  (the  destination  sequence)  and  its  right  argument  is  a  sequence  of 
integer- value  pairs.  For  each  element  (i;v)  in  the  sequence  of  pairs,  the  value  v  is  inserted 
into  position  i  of  the  destination  sequence.  For  example,  the  expression 

"an  example"<-[(4,‘s),(2,‘d),(3,space)]  ; 

=>  "and  sample"  :  [char] 

inserts  the  a,  d  and  apaca  into  the  string  "an  example"  at  locations  4,  2  and  3,  respectively 
(apace  is  a  constant  that  is  bound  to  the  space  character). 

Ranges  of  integers  can  be  created  using  square  brackets  along  with  a  colon.  The  notation 
[start :  end]  creates  a  sequence  of  integers  starting  at  start  and  ending  one  before  end. 
For  example: 

[10:16]; 

=►  [10,  11,  12,  13,  14,  IS]  :  [int] 

An  additional  stride  can  be  specified,  as  in  [start: end: stride],  which  returns  every 
stride**  integer  between  start  and  end.  For  example: 

[10:25:3]; 

=>  [10,  13,  16,  19,  22]  :  [int] 

The  integer  end  is  never  included  in  the  sequence. 

Using  these  operations,  it  is  easy  to  define  many  of  the  other  Nesl  functions.  Figure  6 
shows  several  examples. 

2.1  String  Searching 

The  first  example  is  a  function  that  finds  all  occurrences  of  a  word  in  a  string  (a  sequence  of 
characters).  The  function  stringjsearch(w.a)  (see  Figure  7)  takes  a  search  word  v  and  a 
string  s,  and  returns  the  starting  position  of  all  substrings  in  s  that  match  v.  For  example, 

atring-aearch("loo" ,  "fobarf  oofboof  oo") ; 

=>  [5 , 12]  :  [int] 
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Junction  subseq(a, start, end)  =  a-> [start : sad] ; 

Junction  take(a,n)  =  a->[0:n]; 

Junction  drop(a,n)  *  a->Cn:#a]; 

Junction  rotate(a,n)  *  a->{nod(i-n,#a)  :  i  in  [n:n  +  #a] } ; 

Junction  STsn_alts(a)  ■  a->[0:#a:2]  ; 

Junction  odd_slts(a)  *  a->[l:#a:2] ; 

Junction  bottop(a)  3  [a->[0:#a/2] ,a->[#a/2:#a]] ; 

Figure  6:  Possible  implementation  for  several  of  the  Nesl  functions  on  sequences. 


Junction  next.charactar  (candidates ,  o ,  s ,  i )  = 
iJ  (i  33  *v)  then  candidates 

else 

let  letter  3  w[ij; 

neztJL  3  s->{c  +  i:  c  in  candidates}; 
candidates  3  {c  in  candidates;  n  in  next_l  I  n  ==  letter) 
in  next. character (candidates,  w,  s,  i+1); 

Junction  string_search(v,  s)  3  next.characterC [0:#s  -  *v],s,s,0); 

Figure  7:  Finding  all  occurrences  of  the  word  ?  in  the  string  s. 
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The  algorithm  starts  by  considering  all  positions  between  0  and  ts-ts  as  candidates 
for  a  match  (no  candidate  could  be  greater  than  this  since  it  would  have  to  match  past  the 
end  of  the  string).  The  candidates  are  stored  as  pointers  (indices)  into  s  of  the  beginning 
of  each  match.  The  algorithm  then  progresses  through  the  search  string,  using  recursive 
calls  to  next.char,  narrowing  the  set  of  candidate  matches  on  each  step. 

Based  on  the  current  candidates,  next.char  narrows  the  set  of  candidates  by  only 
keeping  the  candidates  that  match  on  the  next  character  of  w.  To  do  this,  each  candidate 
checks  whether  the  itfc  character  in  v  matches  the  itfc  position  past  the  candidate  index. 
All  candidates  that  do  match  are  packed  and  passed  into  the  recursive  call  of  next.char. 
The  recursion  completes  when  the  algorithm  reaches  the  end  of  u.  The  progression  of  cands 
in  the  "foo"  example  would  be: 

i  cands 

0  [0,  6,  8,  12] 

1  [0.  5.  12] 

2  [5.  12] 

Lets  consider  the  complexity  of  the  algorithm.  We  assume  #w  =  m  and  #s  =  n.  The 
number  of  steps  taken  by  the  algorithm  is  some  constant  times  the  number  of  recursive 
calls,  which  is  simply  O(m).  The  work  complexity  of  the  algorithm  is  the  sum  over  the 
calls  of  the  number  of  candidates  in  each  step.  In  practice,  this  is  usually  O(n),  but  in 
the  worst  case  this  can  be  the  product  of  the  two  lengths  0(nm)  (the  worst  case  can  only 
happen  if  most  of  the  characters  in  w  are  repeated).  There  are  parallel  string-searching 
algorithms  that  give  better  bounds  on  the  parallel  time  (step  complexity),  and  that  bound 
the  worst  case  work  complexity  to  be  linear  in  the  length  of  the  search  string  [11,  37],  but 
these  algorithms  are  somewhat  more  complicated. 

2.2  Primes 

Our  second  example  finds  all  the  primes  less  than  n.  The  algorithm  is  based  on  the  sieve 
of  Eratosthenes.  The  basic  idea  of  the  sieve  is  to  find  all  the  primes  less  than  y/n,  and  then 
use  multiples  of  these  primes  to  “sieve  out”  all  the  composite  numbers  less  than  n.  Since 
all  composite  numbers  less  than  n  must  have  a  divisor  less  than  y/ii,  the  only  elements  left 
unsieved  will  be  the  primes.  There  are  many  parallel  versions  of  the  prime  sieve,  and  several 
naive  versions  require  a  total  of  0(n3^2)  work  and  either  0{n1^)  or  0(n)  parallel  time.  A 
well  designed  version  should  require  no  more  work  than  the  serial  sieve  (O(nlglgn)),  and 
polylogarithmic  parallel  time. 

The  version  we  use  (see  Figure  8)  requires  0(n Iglgn)  work  and  O(lglgn)  steps.  It 
works  by  first  recursively  finding  all  the  primes  up  to  y/n,  (sqr_primes).  Then,  for  each 
prime  p  in  sqr-prines,  the  algorithm  generates  all  the  multiples  of  p  up  to  n  (sieves).  This 
is  done  with  the  [s:e:d]  construct.  The  sequence  sieves  is  therefore  a  nested  sequence 
with  each  subsequence  being  the  sieve  for  one  of  the  primes  in  sqr_primes.  The  function 
flatten,  is  now  used  to  flatten  this  nested  sequence  by  one  level,  therefore  returning  a 
sequence  containing  all  the  sieves.  For  example, 
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function  primes (n)  * 
if  n  *»  2  than  [2] 

alaa 

let  aqr-priaaa  *  priaas ( cail ( aqrt (float (n) ) ) ) ; 
sieves  =  { [2*p:a:p] :  p  in  sqr-primes}; 
flat -sieves  *  flatten(sieves) ; 
flnga  *  di«t(t,n)  <-  {(i,f):  i  in  flat-sieves} 
in  drop({i  in  [0:n];  flag  in  flags)  flag},  2)  ; 

Figure  8:  Finding  all  the  primes  less  than  n. 

flatten([[4,  6,  8,  10.  12,  14,  16,  18],  [6,  9,  12,  15,  18]]); 

=>  C4,  6,  8,  10,  12,  14,  16,  18,  6,  9,  12,  15,  18]  :  [int] 

This  sequence  of  sieves  is  used  by  the  <-  function  to  place  a  false  flag  in  all  positions  that 
are  a  multiple  of  one  of  the  sqr-primes.  This  will  return  a  boolean  sequence,  flags,  which 
contains  a  t  in  all  places  that  were  not  knocked  out  by  a  sieve — these  are  the  primes. 
However,  we  want  primes  to  return  the  indices  of  the  primes  instead  of  flags.  To  generate 
these  indices  the  algorithm  creates  a  sequences  of  all  indices  between  0  and  n  ( [0  :n] )  and 
uses  subselection  to  remove  the  nonprimes.  The  function  drop  is  then  used  to  remove  the 
first  two  elements  (0  and  1),  which  are  not  considered  primes  but  do  not  get  explicitly 
sieved. 

The  functions  [s:e:d]  ,  flatten,  dist,  <-  and  drop  all  require  a  constant  number 
of  steps.  Since  primes  is  called  recursively  on  a  problem  of  size  y/n  the  total  number  of 
steps  require  by  the  algorithm  can  be  written  as  the  recurrence: 

s<">  -  {s(%  +  0(U  »>!  -  0(lg,E"> 

Almost  all  the  work  done  by  primes  is  done  in  the  first  call.  In  this  first  call,  the  work  is 
proportional  to  the  length  of  the  sequence  flat  .sieves.  Using  the  standard  formula 

^2 1  /p  -  log  log  x  +  c  +  o{  i/iogx) 

p<* 

where  p  are  the  primes  [18],  the  length  of  this  sequence  is: 

£  n/p  =  0(nloglog  y/n) 

P<Vn 

=  <3(nlogIogn) 

therefore  giving  a  work  complexity  of  0(n  log  log  n). 
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(ABCDEFGHIJKLMNOP) 

A  [B  D  F  G  H  J  K  M  O]  P  [C  E  I  L  N] 

A  |B  F]  J  [Oj  P  N  [C  E) 

AB JOPNC 

Figure  9:  An  example  of  the  quickhull  algorithm.  Each  sequence  shows  one  step  of  the 
algorithm.  Since  A  and  P  are  the  two  x  extrema,  the  line  AP  is  the  original  split  line.  J  and 
N  are  the  farthest  points  in  each  subspace  from  AP  and  are,  therefore,  used  for  the  next  level 
of  splits.  The  values  outside  the  brackets  are  hull  points  that  have  already  been  found. 
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fraction  croa*-product(o,lin«)  = 
lot  (xo,yo)  *  o; 

((xl,yl) ,(x2,y2))  *  lino 
in  (xl-xo)*(y2-yo)  -  (yl-yo)*(x2-xo) ; 

fraction  hoplit(pointo,pl ,p2)  = 

lot  croto  =  (cross-product  (p, (pi ,p2) ) :  p  in  points}; 

pnckod  *  {p  in  points;  c  in  cross  I  plasp(c)} 
in  if  (fpackod  <  2)  than  [pi]  ++  pack ad 

olso 

lot  pa  *  points Doox-indox(cross)] 

in  f lattra( {hsplit (packed, pi, p2):  pi  in  Cpl>pa];  p2  in  [pa,p2]}); 

fraction  convoxJmll (points)  * 
lot  x  *  {x  :  (x,y)  in  points}; 

minx  *  points  DsinJLndax(x)]  ; 
aaxx  *  points Caox-indox(x)] 

in  hsplit (points , minx, aaxx)  ++  hspl it (points, aaxx , ainr ) ; 

Figure  10:  Code  for  Quickhuil.  Each  point  is  represented  as  a  pair.  Pattern  matching  is  used 
to  extract  the  x  and  y  coordinates  of  each  pair. 

2.3  Planar  Convex-Hull 

Our  next  example  solves  the  planar  convex  hull  problem:  given  n  points  in  the  plane,  find 
which  of  these  points  lie  on  the  perimeter  of  the  smallest  convex  region  that  contains  all 
points.  The  planar  convex  hull  problem  has  many  applications  ranging  from  computer 
graphics  [17]  to  statistics  [20].  The  algorithm  we  use  to  solve  the  problem  is  a  parallel 
version  [8]  of  the  quickhuil  algorithm  [29].  The  quickhuil  algorithm  was  given  its  name 
because  of  its  similarity  to  the  quicksort  algorithm.  As  with  quicksort,  the  algorithm  picks 
a  “pivot”  element,  splits  the  data  based  on  the  pivot,  and  is  recursively  applied  to  each  of 
the  split  sets.  Also,  as  with  quicksort,  the  pivot  element  is  not  guaranteed  to  split  the  data 
into  equally  sized  sets,  and  in  the  worst  case  the  algorithm  will  require  0(n2)  work. 

Figure  9  shows  an  example  of  the  quickhuil  algorithm,  and  Figure  10  shows  the  code. 
The  algorithm  is  based  on  the  recursive  routine  hsplit.  This  function  takes  a  set  of  points 
in  the  plane  ((x,y)  coordinates)  and  two  points  pi  and  p2  that  are  known  to  lie  on  the 
convex  hull,  and  returns  all  the  points  that  lie  on  the  hull  clockwise  from  pi  to  p2,  inclusive 

of  pi,  but  not  of  p2.  In  Figure  9,  given  all  the  points  [A,  B,  C . P],  pi  *  A  and  p2 

»  P,  hsplit  would  return  the  sequence  [A,  B,  J,  0].  In  hsplit,  the  order  of  pi  and  p2 
matters,  since  if  we  switch  A  and  P,  hsplit  would  return  the  hull  along  the  other  direction 
CP,  B,  C]. 

The  hsplit  function  works  by  first  removing  all  the  elements  that  cannot  be  on  the  hull 
since  they  lie  below  the  line  between  pi  and  p2.  This  is  done  by  removing  elements  whose 
cross  product  with  the  line  between  pi  and  p2  are  negative.  In  the  case  pi  *  A  and  p2  ■ 
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P,  the  points  [B,  D,  F,  G,  H,  J,  K,  M,  0]  would  remain  and  be  placed  in  the  sequence 
packed.  The  algorithm  now  finds  the  point  furthest  from  the  line  pl-p2.  This  point  pm 
mnst  be  on  the  hull  since  as  a  line  at  infinity  parallel  to  pl-p2  moves  toward  pl-p2,  it  must 
first  hit  pm.  The  point  pa  (J  in  the  running  example)  is  found  by  taking  the  point  with  the 
maximum  cross-product.  Once  pa  is  found,  hsplit  calls  itself  twice  recursively  using  the 
points  (pi,  pa)  and  (pa,  p2)  ((A,  J)  and  (J,  P)  in  the  example).  When  the  recursive 
calls  return,  hsplit  flattens  the  result  (this  effectively  appends  the  two  subhulls). 

The  overall  convax-hull  algorithm  works  by  finding  the  points  with  minimum  and 
maximum  x  coordinates  (these  points  must  be  on  the  hull)  and  then  using  hsplit  to  find 
the  upper  and  lower  hull.  Each  recursive  call  has  a  step  complexity  of  0(1)  and  a  work 
complexity  of  0(n).  However,  since  many  points  might  be  deleted  on  each  step,  the  work 
complexity  could  be  significantly  less.  For  m  hull  points,  the  algorithm  runs  in  O(lgm) 
steps  for  well-distributed  hull  points,  and  has  a  worst  case  running  time  of  0(m)  steps. 

3  Language  Definition 

This  section  defines  Nesl.  It  is  not  meant  as  a  formal  semantics  but,  along  the  full  definition 
of  the  syntax  in  Appendix  A  and  description  of  all  the  built-in  functions  in  Appendix  B, 
it  should  serve  as  an  adequate  description  of  the  language.  Nesl  is  a  strict  first-order 
strongly-typed  language  with  the  following  data  types: 

•  four  primitive  atomic  datatypes:  booleans  (bool),  integers  (int),  characters  (char), 
and  floats  (float); 

•  the  primitive  sequence  type; 

•  the  primitive  pair  type; 

•  and  user  definable  compound  datatypes; 
and  the  following  operations: 

•  a  set  of  predefined  functions  on  the  primitive  types; 

•  three  primitive  constructs:  a  conditional  construct  if,  a  binding  construct  let,  and 
the  apply- to-each  construct; 

•  and  a  function  constructor,  function,  for  defining  new  functions. 

This  section  covers  each  of  these  topics. 

3.1  Data 

3.1.1  Atomic  Data  Types 

There  are  four  primitive  atomic  data  types:  booleans ,  integers,  characters  and  floats. 

The  boolean  type  bool  can  have  one  of  two  values  t  or  f .  The  standard  logical  op¬ 
erations  (eg.  not,  and,  or,  xor,  nor,  nand)  are  predefined.  The  operations  and,  or, 
xor,  nor,  nand  all  use  infix  notation.  For  example: 
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not (not (t)) ; 

=>  t  :  bool 
t  xor  f ; 

=>  t  :  bool 

The  integer  type  int  is  the  set  of  (positive  and  negative)  integers  that  can  be  represented 
in  the  fixed  precision  of  a  machine-sized  word.  The  exact  precision  is  machine  dependent, 
but  will  always  be  at  least  32-bits.  The  standard  functions  on  integers  (+,  - ,  *,  /, 
",  >,  <,  negate,  . . .)  are  predefined,  and  use  infix  notation  (see  Appendix  A  for  the 
precedence  rules).  For  example: 

3  *  -11; 

=>  -33  :  int 

7  —  8; 

=►  f  :  bool 

Overflow  will  return  unpredictable  results. 

The  character  type  char  is  the  set  of  ASCII  characters.  The  characters  have  a  fixed 
order  and  all  the  comparison  operations  (eg.  »»,  <,  >*,...)  can  be  used.  Characters  are 
written  by  placing  a  ‘  in  front  of  the  character.  For  example: 

‘8; 

=>  ‘8  :  char 

*a  »«  *d; 

=>  f  :  bool 

‘a  <  ‘d; 

=>  t  :  bool 

The  global  variables  space,  newline  and  tab  are  bound  to  the  appropriate  characters. 

The  type  float  is  used  to  specify  floating-point  numbers,  'ihe  exact  representation  of 
these  numbers  is  machine  specific,  but  Nesl  tries  to  use  64-bit  IEEE  when  possible.  Floats 
support  most  of  the  same  functions  as  integers,  and  also  have  several  additional  functions 
(eg.  round,  truncate,  sqrt,  log,...).  Floats  must  be  written  by  placing  a  decimal 
point  in  them  so  that  they  can  be  distinguished  from  integers. 

1.2  *  3.0; 

=>  3.6  :  float 

round(2. 1) ; 

=>  2  :  int 
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There  is  no  implicit  coercion  between  scalar  types.  To  add  2  and  3.0,  for  example,  it  is 
necessary  to  coerce  one  of  them:  e.g. 

float (2)  ♦  3.0; 

=>  5.0  :  float 

A  complete  list  of  the  functions  available  on  scalar  types  can  be  found  in  Appendix  B.l. 
3.1.2  Sequences  (□) 

A  sequence  is  an  ordered  set  of  values.  A  sequence  can  contain  any  type,  including  other 
sequences,  but  each  element  in  a  sequence  must  be  of  the  same  type  (sequences  are  homo¬ 
geneous).  The  type  of  a  sequence  whose  elements  are  of  type  a,  is  specified  as  [a] .  For 

examples: 

[6.  2.  4,  SI ; 

=>  [6,  2,  4.  5]  :  [int] 

[[2.  1.  7.  3],  [6.  2],  [22,  9]]; 

=►  CC2.  1.  7.  3],  [6,  2],  [22,  9]]  :  [[int]] 

Sequences  of  characters  can  be  written  between  double  quotes, 

"a  string"; 

=>  "a  string"  :  [char] 

but  can  also  be  written  as  a  sequence  of  characters: 

[‘a,  Spacs,  * s ,  ‘t,  ‘r,  ‘i,  ‘n,  ‘g]; 

=>  "a  string"  :  [char] 

Empty  sequences  must  be  explicitly  typed  since  the  type  cannot  be  determined  from 
the  elements.  The  type  of  an  empty  sequences  is  specified  by  using  empty  square  braces 
followed  by  the  type  of  the  elements.  For  example, 

□  int; 

=>  □  :  [int] 

□  (int, bool); 

=>  □  :  [(int, bool)] 

Appendix  B.2  describes  the  functions  that  operate  on  sequences. 
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3.1.3  Record  Types  (datatype) 

Record  types  with  a  fixed  number  of  slots  can  be  defined  with  the  datatype  construct.  For 
example, 

datatype  complex(float, float) ; 

=»  complex (al ,a2)  :  float,  float  ->  complex 

defines  a  record  with  two  slots  both  which  must  contain  a  floating-point  number.  Defining 
a  record  also  defines  a  corresponding  function  that  is  used  to  construct  the  record.  For 
example, 

complex (7. 1,1 1.9) ; 

=>  complex (7. 1,11.9)  :  complex 

creates  a  complex  record  with  7.1  and  11.9  as  its  two  values.  The  type  of  the  record  is 
specified  as  complexO. 

Elements  of  a  record  can  be  accessed  using  pattern  matching  in  the  let  construct.  For 
example, 

let  complex (real, imaginary)  *  a 
in  real; 

will  remove  the  real  part  of  the  variable  a  (assuming  it  is  kept  in  the  first  slot).  More  details 
on  pattern  matching  are  given  in  the  next  section. 

As  with  functions,  records  can  be  parameterized  based  on  type- variables.  For  example, 
complex  could  have  been  defined  as: 

datatype  complexC alpha, alpha)  ::  alpha  in  number; 

complex(al,a2)  :  alpha,  alpha  ->  complex (alpha)  ::  alpha  in  number 

This  specifies  that  for  alpha  bound  to  any  type  in  the  type-class  number  (either  int  or 
float),  both  slots  must  be  of  type  alpha.  This  will  allow  either, 

complex(7.1,  11.9); 

=>  complex(7.1,  11.9)  :  complex(float) 
complex (7,  11); 

=►  complex  (7,  11)  :  complex  (int) 

but  will  not  allow  complex(7,  ‘a)  or  complex(2,  2.2).  The  type  of  a  record  is  specified 
by  the  record  name  followed  by  the  binding  of  all  its  type- variables.  In  this  case,  the  binding 
of  the  type- variable  is  either  int  or  float. 
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3.2  Functions  and  Constructs 

3.2.1  Conditionals  (if) 

The  only  primitive  conditional  in  Nesl  is  the  if  construct.  The  syntax  is: 

IF  exp  THEN  exp  ELSE  exp 

If  the  first  expression  is  true,  then  the  second  expression  is  evaluated  and  its  result  is 
returned,  otherwise  the  third  expression  is  evaluated  and  its  result  is  returned.  The  first 
expression  must  be  of  type  bool,  and  the  other  two  expressions  must  be  of  identical  types. 
For  example: 

if  (t  and  f)  then  3+4  else  (6  -  2) *7 
is  a  valid  expression,  but 

if  (t  and  f)  then  3  else  2.6 
is  not,  since  the  two  branches  return  different  types. 

3.2.2  Binding  Local  Variables  (let) 

Local  variables  can  be  bound  with  the  let  construct.  The  syntax  is: 

LET  expbinds  IN  exp 

expbinds  : :  ■  expbind  [ ;  expbinds ]  variable  bindings 

expbind  : :  ■  pattern  ■  exp  variable  binding 

pattern  : :  *  ident  variable 

ident  (pattern)  datatype  pattern 

pattern,  pattern  pair  pattern 

(  pattern  ) 

The  semicolon  separates  bindings  (the  square  brackets  indicate  an  optional  term  of  the 
syntax).  Each  pattern  is  either  a  variable  name  or  a  pattern  based  on  a  record  name.  Each 
expbind  binds  the  variables  in  the  pattern  on  the  left  of  the  *  to  the  result  of  the  expression 
on  the  right.  For  example: 

let  a  ■  7; 

(b,c)  «  (1,2) 
in  a*(b  ♦  c) ; 

=>  21  :  int 

Here  a  is  bound  to  7,  then  the  pattern  (b,  c)  is  matched  with  the  result  of  the  expression 
on  the  right  so  that  b  is  bound  to  1  and  c  is  bound  to  2.  Patterns  can  be  nested,  and  the 
patterns  are  matched  recursively. 

The  variables  in  each  expbind  can  be  used  in  the  expressions  (exp)  of  any  later  expbind 
(the  bindings  are  done  serially).  For  example,  in  the  expression 
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let  a  »  7; 

b  »  a  ♦  4 
in  a  *  b; 

=>■  77  :  int 

the  variable  a  is  bound  to  the  value  7  and  then  the  variable  b  is  bound  to  the  value  of  a 
plus  4,  which  is  11.  When  these  are  multiplied  in  the  body,  the  result  is  77. 

3.2.3  The  Apply-to-Each  Construct  ({}) 

The  apply-to-each  construct  is  used  to  apply  any  function  over  the  elements  of  a  sequence. 
It  has  the  following  syntax: 

{[exp  :]  rbinda  [I  exp]} 
rbinds  : :  ■  rbind  [ ;  rfttnds] 

rbind  pattern  IN  exp  full  binding 

ident  shorthand  binding 

An  apply-to-each  construct  consists  of  three  parts:  the  expression  before  the  colon,  which 
we  will  call  the  body,  the  bindings  that  follow  the  body,  and  the  expression  that  follows  the 
I ,  which  we  will  call  the  sieve.  Both  the  body  and  the  sieve  are  optional:  they  could  both 
be  left  out,  as  in 

{a  in  [1,  2,  3]}; 

=>  [1.  2,  3]  :  [int] 

The  rbinds  can  contain  multiple  bindings  which  are  separated  by  semicolons.  We  first 
consider  the  case  in  which  there  is  a  single  binding.  A  binding  can  either  consist  of  a  pattern 
followed  by  the  keyword  IH  and  an  expression  (full  binding),  or  consist  of  a  variable  name 
(shorthand  binding).  In  a  full  binding  the  expression  is  evaluated  (it  must  evaluate  to  a 
sequence)  and  the  variables  in  the  pattern  are  bound  in  turn  to  each  element  of  the  sequence. 
The  body  and  sieve  are  applied  for  each  of  these  bindings.  For  example: 

{a  ♦  2:  a  in  [1,  2,  3]}; 

=>•  [3,  4,  5]  :  [int] 

{a  ♦  b:  (a,b)  in  [(1,2),  (3,4),  (5,6)]}; 

=>  [3,  7,  11]  :  [int] 

In  a  shorthand  binding,  the  variable  must  be  a  sequence,  and  the  body  and  sieve  are  applied 
to  each  element  of  the  sequence  with  the  variable  name  bound  to  the  element.  For  example: 

let  a  ■  [1,  2,  3] 
in  {a  +  2:  a}; 

=>  [3,  4,  5]  :  [int] 
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In  the  case  of  multiple  rbinds ,  each  of  the  sequences  (either  the  result  of  the  expression 
in  a  full  binding  or  the  value  of  the  variable  in  a  shorthand  binding)  must  be  of  equal 
length.  The  bindings  are  interleaved  so  that  the  body  is  evaluated  with  bindings  made  for 
elements  at  the  same  index  of  each  sequence.  For  example: 

{a  ♦  b:  a  in  [1,  2,  3];  b  in  [i,  4,  9]}; 

=>  [2,  6,  12]  :  Cint] 

{dist(b.a):  a  in  [1,  2,  3] ;  b  in  [1,  4,  9]}; 

=>  CCl],  C4,  4],  [9,  9,  9]]  :  [Cint]] 

An  apply-to-each  with  a  body  and  two  bindings, 

{body:  pattarnl  in  expl;  pattern2  in  exp2  I  sieve} 

is  equivalent  to  the  single  binding  construct 

{body:  (patternl ,pattern2)  in  zip (expl ,exp2)  I  sieve} 

where  zip,  as  defined  in  the  list  of  functions,  elementwise  zips  together  the  two  sequences 
it  is  given  as  arguments. 

If  there  is  no  body  in  an  apply-to-each  construct,  then  the  results  of  the  first  binding  is 
returned.  For  example: 

{a  in  Cl,  2,  3] ;  b  in  [1,  4,  9]}; 

=►  Cl.  2,  3]  :  Cint] 

{a  in  Cl,  2,  3];  b  in  C2,  4,  9]  lb  •»  2*a}; 

=>  Cl,  2]  :  Cint] 

{b  in  C2,  4,  9];  a  in  Cl,  2,  3]  |  b  »■  2*a}; 

=>  C2,  4]  :  Cint] 

If  there  is  a  body  and  a  sieve,  the  body  and  sieve  are  both  evaluated  for  all  bindings, 
and  then  the  subselection  is  applied.  An  apply-to-each  with  a  sieve  of  the  form: 

{body  :  bindings  I  sieve} 

is  equivalent  to  the  construct 

pack({(body, sieve)  :  bindings}) 

where  pack,  as  defined  in  the  list  of  functions,  takes  a  sequence  of  type  C( alpha, bool)] 
and  returns  a  sequence  which  contains  the  first  element  of  each  pair  if  the  second  element 
is  true.  The  order  of  remaining  elements  is  maintained. 


3.2.4  Defining  New  Functions  (function) 

Functions  can  be  defined  at  top-level  using  the  function  construct.  The  syntax  is: 

FUVCTIOH  ident  pattern  [:  /untype]  *  exp  ; 

A  function  has  one  argument,  but  the  argument  can  be  any  pattern.  The  body  of  a 
function  (the  exp  at  the  end)  can  only  refer  to  variables  bound  in  the  pattern ,  or  variables 
declared  at  top-level.  Any  function  referred  to  in  the  body  can  only  refer  to  functions 
previously  defined  or  to  the  function  itself  (at  present  there  is  no  way  to  define  mutually 
recursive  functions).  As  with  all  functional  languages,  defining  a  function  with  the  same 
name  as  a  previous  function  only  hides  the  previous  function  from  future  use:  all  references 
to  a  function  before  the  new  definition  will  refer  to  the  original  definition. 

3.2.5  Top-Level  Bindings  (») 

You  can  bind  a  variable  at  top-level  using  the  ■  operator.  The  syntax  is: 

ident  *  exp ; 

For  example,  a  *  211;  will  bind  the  variable  a  to  the  value  211.  The  variable  can  now 
either  be  referenced  at  top  level,  or  can  be  referenced  inside  of  any  function.  For  example, 
the  definition 

function  foo(c)  «  c  ♦  a; 

would  define  a  function  that  adds  211  to  its  input.  Such  top-level  binding  is  mostly  useful 
for  saving  temporary  results  at  top-level,  and  for  defining  constants.  The  variable  pi  is 
bound  at  top  level  to  the  value  of  t. 
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A  The  Nesl  Grammar 

This  appendix  defines  the  grammar  of  Nesl.  The  grammatical  conventions  are: 

•  The  brackets  [  ]  enclose  optional  phrases,  the  symbol  *  means  repeat  the  previ¬ 
ous  expression  any  number  of  times,  and  the  symbol  ♦  means  repeat  the  previous 
expression  any  number  of  times,  but  at  least  once. 

•  All  symbols  in  typewriter  font  are  literal  tokens,  all  symbols  in  boldface  are  to¬ 
kens  with  the  lexical  definitions  given  below,  and  all  symbols  in  italics  are  variables 
(nonterminals)  of  the  grammar. 

•  All  uppercase  letters  can  either  be  upper  or  lower  case.  Nesl  is  case  insensitive. 

Toplevel 


toplevel  :  :■ 

FUNCTION  ident  pattern  [: 
DATATYPE  ident  typeexp  [: 
ident  *  exp  ; 
exp  ; 

funtypel  * 

:  typebinds ] 

exp  ;  function  definition 
;  datatype  definition 

variable  binding 
expression 

Types 

fvntype 

l  m 

typeexp  ->  typeexp  [: : 

typebinds ] 

function  type 

typebinds 

;  m 

typebind  [;  typebinds ] 

binding  type  variables 

typebind 

l  M 

ident  IN  typeclass 

binding  a  type  variable 

typeexp 

•  a 

basetype 

ident 

ident  (  {.typelist"]  ) 
typeexp  ,  typeexp 
‘[’  typeexp  ']’ 

(  typeexp  ) 

base  type 
type  variable 
compound  datatype 
pair  type 
sequence  type 

typelist 

:» 

typeexp  [,  typelist ] 

type  list 

typeclass 

•  m 

NUMBER  |  ORDINAL  I  LOGICAL  |  ANY 

the  type  classes 

basetype 

INT  I  BOOL  I  FLOAT  | 

CHAR 

the  base  types 
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Expressions 

constant 
variable 
conditional 
local  bindings 
apply-to-each 
function  application 
binary  operator 
unary  operator 
sequence 

sequence  extraction 
parenthesized  expression 


expbinds 

pattern  »  exp  [; 

expbinds] 

variable  bindings 

pattern 

: : »  ident 

ident  (pattern) 
pattern,  pattern 
(  pattern  ) 

variable 

datatype  pattern 
pair  pattern 

rbinds 

rbind  [;  rbinds ] 

rbind 

:  :■  pattern  IN  exp 
ident 

iteration  binding 
shorthand  form 

sequence 

:  *  C*  explist  ’ 

*[’  *]’  typeexp 
*  ['  exp  :  exp  | 

[:  exp]  *]' 

listed  sequence 
empty  sequence 
integer  range 

explist 

exp  [,  explist ] 

const 

: :  *  intconst 
floatconst 
boolconst 
stringconst 

fixed  precision  integer 
fixed  precision  float 
boolean  (T  or  F) 
character  string 

precedence  1 
precedence  2 
precedence  3 
precedence  4 
precedence  5 
precedence  6 
precedence  7 

unaryop  t  I  C  I  -  precedences 


binop  : :  *  , 

OR  1  NOR  |  XOR 
AND  |  NAND 

♦|-l++l<- 

*!/!-> 


exp  : :  »  const 

ident 

IF  exp  THEN  exp  ELSE  exp 

LET  expbinds  IN  exp 

{C exp  :]  rbinds  [1  exp] } 

ident  (exp] 

exp  binop  exp 

unaryop  exp 

sequence 

exp  ‘  C’  exp  ']  1 
(exp  ) 
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Lexical  Definitions 

The  following  defines  regular  expressions  for  the  lexical  classes  of  tokens.  The  grammatical 
conventions  are: 

•  All  uppercase  letters  can  either  be  upper  or  lower  case.  Nesl  is  case  insensitive. 

•  The  brackets  (  }  enclose  an  expression.  The  brackets  [  ]  enclose  a  character  set, 
any  one  of  which  must  match.  The  expression  0-9  within  square  brackets  means  all 
digits  and  the  expression  A-Z  means  all  letters.  The  symbol  *  as  the  first  character 
within  square  brackets  means  a  compliment  character  set  (all  characters  excepting 
the  following  ones). 

•  The  symbol  *  means  the  previous  expression  can  be  repeated  as  many  times  as  needed, 
the  symbol  *  means  the  previous  expression  can  be  repeated  as  many  times  as  needed 
but  at  least  once,  and  the  symbol  ?  means  the  previous  expression  can  be  matched 
either  once  or  not  at  all. 

intconst  ::=  [-♦]  ?  [0-9]  ♦ 

floatconst  ::=  [-♦]  ?  [0-9]  * .  [0-9] + (  [eE]  [-+]  ?  [0-9]  +)  ? 

ident  [_A-Z0-9]  + 

boolconst  ::=  [TF] 

stringconst  "[“"]*" 
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B  List  of  Functions 

This  section  lists  the  functions  available  in  Nesl.  Each  function  is  listed  in  the  following 
way: 

function  interface  {source-types  —*  result-type  :  type-bindings} 

Definition  of  function. 

The  hierarchy  of  the  type  classes  is  shown  in  Figure  5. 

B.l  Scalar  Functions 
Logical  Functions 

All  the  logical  functions  work  on  either  integers  or  booleans.  In  the  case  of  integers,  they 
work  bitwise  over  the  bit  representation  of  the  integer. 

not(a)  {a  -»  a  :  a  in  logical} 

Returns  the  logical  inverse  of  the  argument.  For  integers,  this  is  the  ones  complement. 

a  or  b  {a, a  — ►  a  :  a  in  logical} 

Returns  the  inclusive  or  of  the  two  arguments. 

a  and  b  {a, a  — ►  a  :  a  in  logical} 

Returns  the  logical  and  of  the  two  arguments. 

a  xor  b  {a, a  — ►  a  :  a  in  logical} 

Returns  the  exclusive  or  of  the  two  arguments. 

a  nor  b  {a, a  — ►  a  :  a  in  logical} 

Returns  the  inverse  of  the  inclusive  or  of  the  two  arguments. 

a  nand  b  {a, a  — *  a  :  a  in  logical } 

Returns  the  inverse  of  the  and  of  the  two  arguments. 

Comparison  Functions 

All  comparison  functions  work  on  integers,  floats  and  characters. 

a  ■■  b  {a, a  — *  bool :  a  in  ordinal} 

Returns  t  if  the  two  arguments  are  equal. 

a  /»  b  {a, a  —*  bool :  a  in  ordinal } 

Returns  t  if  the  two  arguments  are  not  equal. 


36 


a  <  b  {a,a  — ►  bool :  a  in  ordinal } 

Returns  t  if  the  first  argument  is  strictly  less  than  the  second  argument. 


a  >  b  {a, a  -*  bool :  a  in  ordinal } 

Returns  t  if  the  first  argument  is  strictly  greater  than  the  second  argument. 

a  <*  b  {a, a  — »  bool :  a  in  ordinal } 

Returns  t  if  the  first  argument  is  less  than  or  equal  to  the  second  argument. 

a  >■  b  {a, a  —*  bool :  a  in  ordinal } 

Returns  t  if  the  first  argument  is  greater  or  equal  to  the  second  argument. 

Predicates 
plusp(v) 

Returns  t  if  v  is  strictly  greater  than  0. 
minusp(v) 

Returns  t  if  v  is  strictly  less  than  0. 
zarop(v) 

Returns  t  if  v  is  equal  to  0. 
oddp(v) 

Returns  t  if  ▼  is  odd  (not  divisible  by  two). 

•vanp(v) 

Returns  t  if  ▼  is  even  (divisible  by  two). 

Arithmetic  Functions 


{a, a  — ►  a  :  a  in  number} 
{a, a  —*  a  :  a  in  number) 
(a  — ♦  a  :  a  in  number } 
{a  — ►  a  :  a  in  number) 


a  +  b 

Returns  the  sum  of  the  two  arguments, 
a  -  b 

Subtracts  the  second  argument  from  the  first. 

-v 

Negates  a  number. 
abs(x) 

Returns  the  absolute  value  of  the  argument. 


{a  —*  bool  :  a  in  number ) 


{a  — ►  bool :  a  in  number } 


{a  — ►  bool  :  a  in  number } 


{int  — ►  bool} 


{int  — ►  bool} 
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diff(x.  y)  {a, a  — ►  a  :  a  in  number} 

Returns  the  absolute  value  of  the  difference  of  the  two  arguments. 

iu(i,  b)  {a, a  -*  a  :  a  in  ordinal } 

Returns  the  argument  that  is  greatest  (closest  to  positive  infinity). 

ain(a,  b)  {a, a  — ►  a  :  a  in  ordinal} 

Returns  the  argument  that  is  least  (closest  to  negative  infinity). 

v  *  d  {a, a  — ►  a  :  a  in  number} 

Returns  the  product  of  the  two  argument*:. 

v  /  d  {a,a  — ►  a  :  a  in  number} 

Returns  v  divided  by  d.  If  the  arguments  are  integers,  the  result  is  truncated  towards  0. 

raa(v,  d)  {int,int  — ►  int} 

Returns  the  remainder  after  dividing  v  by  d. 

lshift(a,  b)  {int, int  -*  int} 

Returns  the  first  argument  logically  shifted  to  the  left  by  the  integer  contained  in  the  second 
argument.  Shifting  will  fill  with  0-bits. 

rahift(a,  b)  {int, int  -*  int} 

Returns  the  first  argument  logically  shifted  to  the  right  by  the  integer  contained  in  the 
second  argument.  Shifting  will  fill  with  0-bits  or  the  sign  bit,  depending  on  the  implemen¬ 
tation. 

aqrt(v)  {float  -*  float} 

Returns  the  square  root  of  the  argument.  The  argument  must  be  nonnegative. 

iaqrt(v)  {int  — »  int } 

Returns  the  greatest  integer  less  than  or  equal  to  the  exact  square  root  of  the  integer 
argument.  The  argument  must  be  nonnegative. 

ln(v)  {float  — ►  float } 

Returns  the  natural  log  of  the  argument. 

log(*.  b)  {float, float  —*  float} 

Returns  the  logarithm  of  v  in  the  base  b. 

exp (v)  {float  -*  float} 

Returns  e  raised  to  the  power  v. 
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expt(v,  p) 

Returns  v  raised  to  the  power  p. 
sin(v) 

Returns  the  sine  of  v,  where  v  is  in  radians. 
cos(v) 

Returns  the  cosine  of  v,  where  v  is  in  radians. 
tan(v) 

Returns  the  tangent  of  v,  where  v  is  in  radians. 
asin(v) 

Returns  the  arc  sine  of  v.  The  result  is  in  radians. 
acos(v) 

Returns  the  arc  cosine  of  v.  The  result  is  in  radians. 
atan(v) 

Returns  the  arc  tangent  of  v.  The  result  is  in  radians. 
sinh(v) 

Returns  the  hyperbolic  sine  of  v  ((ex  —  e-r)/2). 
cosh(v) 

Returns  the  hyperbolic  cosine  of  v  ((ex  +  e-x)/2). 
tanh(v) 

Returns  the  hyperbolic  tangent  of  v  ((ex  -  e~x)/(ex  +  e_x)). 

Conversion  Functions 


{float, float  — *  float} 
{ float  — *■  float} 
{float  float} 
{float  —>  float} 
{float  — ►  float} 
{float  —*  float} 
{float  -+  float} 
{float  — ►  float} 
{float  — »  float} 
{float  — ►  float} 


btoi(a)  {bool  -*  int} 

Converts  the  boolean  values  t  and  f  into  1  and  0,  respectively. 

code-char  (a)  {int  — ►  char} 

Converts  am  integer  to  a  character.  The  integer  must  be  the  code  for  a  valid  character. 

char_code(a)  {char  — »  int} 

Converts  a  character  to  its  integer  code. 

float  (v)  {int  — ►  float} 
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Converts  an  integer  to  a  floating-point  number. 

ceil(v)  {float  »'nt} 

Converts  a  floating-point  number  to  an  integer  by  truncating  toward  positive  infinity. 

floor  (v)  {float  — »  int} 

Converts  a  floating-point  number  to  an  integer  by  truncating  toward  negative  infinity. 

trunc(v)  {float  — <■  inf} 

Converts  a  floating-point  number  to  an  integer  by  truncating  toward  zero. 

round(v)  {float  — >  inf} 

Converts  a  floating-point  number  to  an  integer  by  rounding  to  the  nearest  integer;  if  the 
number  is  exactly  halfway  between  two  integers,  then  it  is  implementation  specific  to  which 
integer  it  is  rounded. 

Other  Scalar  Functions 

rand(v)  {a  — ►  a  :  a  in  number) 

For  a  positive  value  v,  rand  returns  a  random  value  in  the  range  [0..v). 

B.2  Sequence  Functions 
Simple  Sequence  Functions 

{N  -  int  :  a  in  any} 

Returns  the  length  of  a  sequence. 

dist(a,  1)  {a, int  — ►  [a]  :  a  in  any) 

Generates  a  sequence  of  length  1  with  the  value  a  in  each  element.  For  example: 

a  ■  ao 

1  -  5 

dist(a,  1)  -  [ao,  ao,  uo,  a0,  ao] 

a[ij  {[a], int  — *  a  :  a  in  any } 

Extracts  the  element  specified  by  index  i  from  the  sequence  a.  Indices  are  zero-based. 

rap(d,  v,  i)  {[a], a, int  — ►  [a]  :  a  in  any} 

Replaces  the  ith  value  in  the  sequence  d  with  the  value  v.  For  example: 
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d  *  C<*o ,  ai ,  02 ,  03 ,  04] 

▼  «  60 

i  ■  3 

rep(d,  v,  i)  ■  [oo,  ai,  a2,  60.  o4] 

zip  (a,  b)  {M'N  Kb>a)J :  a  in  any> b  in  any) 

Zips  two  sequences  of  equal  length  together  into  a  single  sequence  of  pairs. 

Scans  and  Reduces 

plus_scan(a)  {[a]  —*  [a] :  a  in  number } 

Given  a  sequence  of  numbers,  plusjscan  returns  to  each  position  of  a  new  equal-length 
sequence,  the  sum  of  all  previous  positions  in  the  source.  For  example: 

a  »  Cl,  3,  5,  7,  9,  11,  13,  15] 

plus-sean(a)  »  CO,  1,  4,  9,  16,  25,  36,  49] 

max-scan(a)  {[a]  -+  [a]  :  a  in  ordinal}  . 

Given  a  sequence  of  ordinals,  maxjcan  returns  to  each  position  of  a  new  equal-length 
sequence,  the  maximum  of  all  previous  positions  in  the  source.  For  example: 

a  «  C3.  2,  1,  6,  5,  4,  8] 

max-scan(a)  ■  C— 00,  3,  3,  3,  6,  6,  6] 

nin-scan(a)  {[a]  [a]  :  a  in  ordinal } 

Given  a  sequence  of  ordinals,  nimscan  returns  to  each  position  of  a  new  equal-length 
sequence,  the  minimum  of  all  previous  positions  in  the  source. 

or-flcan(a)  {[bool]  — *■  [bool]} 

A  scan  using  logical-or  on  a  sequence  of  booleans. 

ancLscan(a)  {[bool]  -*  [bool]} 

A  scan  using  logical-and  on  a  sequence  of  booleans. 

Cs :  • :  d]  {  int,  int,  int  — *•  / ml ]} 

Returns  a  set  of  indices  starting  at  s,  increasing  by  d,  and  finishing  before  e.  For  example: 

s  *  4 

d  »  3 

e  ■  15 

Cs:e:d]  »  C4,  7,  10,  13] 
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SU*(v)  {[a]  -*  a  :  a  in  number) 

Given  a  sequence  of  numbers,  sum  returns  their  sum.  For  example: 

v  ■  C7,  2,  9,  11.  3] 
sum(v)  ■  32 

max-val(v)  {[a]  — >  a  :  a  in  ordinal) 

Given  a  sequence  of  ordinals,  max.val  returns  their  maximum. 

*in_val(v)  {[a]  — ►  a  :  a  in  ordinal) 

See  max-val. 

any(v)  {[bool]  — *■  bool) 

Given  a  sequence  of  booleans,  any  returns  t  iff  any  of  them  are  t. 

all(v)  {[bool]  —*  bool) 

Given  a  sequence  of  booleans,  all  returns  t  iff  all  of  them  are  t. 

count  (v)  {[bool]  —*  int) 

Counts  the  number  of  t  flags  in  a  boolean  sequence.  For  example: 

v  -  [T.  F,  T.  T,  F,  T,  F.  T] 

count (v)  -  5 


maxJLndex(v)  {[a]  -*  int :  a  in  ordinal } 

Given  a  sequence  of  ordinals,  max-index  returns  the  index  of  the  maximum  value.  If  several 
values  are  equal,  it  returns  the  leftmost  index.  For  example: 

V  -  [2,  11,  4,  7,  14,  6,  9,  14] 

max-index  (v)  ■  4 

Bin-index(v)  {[a]  -*  int  :  a  in  ordinal) 

Given  a  sequence  of  ordinals,  miruindex  returns  the  index  of  the  minimum  value.  If  several 
values  are  equal,  it  returns  the  leftmost  index. 

Sequence  Reordering  Functions 

values  ->  indices  {[a], [int]  — ►  [a]  :  a  in  any) 

Given  a  sequence  of  values  on  the  left  and  a  sequence  of  indices  on  the  right,  which 
can  be  of  different  lengths,  ->  returns  a  sequence  which  is  the  same  length  as  the  indices 
sequence  and  the  same  type  as  the  values  sequence.  For  each  position  in  the  indices 
sequence,  it  extracts  the  value  at  that  index  of  the  values  sequence.  For  example: 
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values  *  [ao,  ai,  a2,  03,  a4,  as,  ao,  07] 

indices  ■  [3,  5,  2,  6] 

values  ->  indices  ■  [03 ,  as ,  a2 ,  a^l 

permute (v,  i)  {[a],[int]  —*  [a]  :  a  in  any} 

Given  a  sequence  v  and  a  sequence  of  indices  i,  which  must  be  of  the  same  length,  permute 
permutes  the  values  to  the  given  indices.  The  permutation  must  be  one-to-one. 

d  <-  ivpairs  {[a],[(int,a)J  -*  [a]  :  a  in  any } 

This  operator,  called  put,  is  used  to  insert  multiple  elements  into  a  sequence.  Its  left 
argument  is  the  sequence  to  insert  into  (the  destination  sequence)  and  its  right  argument  is 
a  sequence  of  integer- value  pairs.  For  each  element  (i,v)  in  the  sequence  of  integer- value 
pairs,  the  value  v  is  inserted  into  position  i  of  the  destination  sequence. 

rotatefa,  i)  {/a/, in*  -*  [a]  :  a  in  any} 

Given  a  sequence  and  an  integer,  rotate  rotates  the  sequence  around  by  i  positions  to  the 
right.  If  the  integer  is  negative,  then  the  sequence  is  rotated  to  the  left.  For  example: 

a  ■  Cao,  a\,  a2,  03,  a4,  05,  ao,  07] 

i  «  3 

rotatefa,  i)  «  [a5>  ao,  a7,  ao,  aj,  a2,  a3,  a4] 

reverse  (a)  .  {[aj  -*  [a] :  a  in  any} 

Reverses  the  order  of  the  elements  in  a  sequence. 

Simple  Sequence  Manipulation 

pack(v)  {[(a, bool)]  — ►  [a]  :  a  in  any} 

Given  a  sequence  of  (value, flag)  pairs,  pack  packs  all  the  values  with* a  t  in  their 
corresponding  flag  into  consecutive  elements,  deleting  elements  with  an  f . 

vl  ♦+  v2  {/a/, /a/  —*  [a]  :  a  in  any} 

Given  two  sequences,  ♦+  appends  them.  For  example: 

vl  ■  [ao,  ai,  a2] 

v2  *  [&o,  &i] 

vl  ♦♦  v2  ■  [ao,  ai,  a2,  b0,  61] 

subseq(v,  start,  end)  {[a],int,int  — >  [a]  :  a  in  any) 

Given  a  sequence,  subseq  returns  the  subsequence  starting  at  position  start  and  ending 
one  before  position  end.  For  example: 


43 


V 

start 

and 


■  [OO.  <*1.  02,  03*  ®4»  O5,  06,  07] 

«  2 

■  6 

subseqCv,  start,  and)  ■  [02,  03,  04,  05] 

drop(v,  n)  {[a],int  ->  [a]  :  a  in  any} 

Given  a  sequence,  drop  drops  the  first  n  items  from  the  sequence.  For  example: 

Y  *  [Oo  •  Oi,  02 ,  03 ,  O4 ,  O5 ,  06 ,  07] 

n  ■  3 

drop(Y,  n)  ■  [03,  04,  a5,  06,  07] 

take(v,  n)  {{a],int  [a]  :  a  in  any} 

Given  a  sequence,  taka  takes  the  first  n  items  from  the  sequence.  For  example: 

Y  *  [ao,  Oi,  02,  03,  04,  a5,  06,  07] 

n  ■  3 

take(Y,  n)  «  Coo.  alf  a2] 

odd_alts(v)  {[a]  -*  fa]  :  a  in  any} 

Returns  the  odd  indexed  elements  of  a  sequence. 

even-alts  (y)  {[a]  —*  [a]  :  a  in  any} 

Returns  the  even  indexed  elements  of  a  sequence. 

interleaveCa,  b)  {[a], [a]  — *•  [a]  :  a  in  any) 

Interleaves  the  elements  of  two  sequences.  The  sequences  must  be  of  the  same  length.  For 
example: 

a  ■  [ao ,  oj ,  02 ,  03] 

b  *  [60 »  &i»  &2»  fa] 

interleaveCa,  b)  ■  Lao,  b0,  ait  bit  ait  b2,  a3,  ft3] 

Nesting  Sequences 

The  two  functions  partition  and  flatten  are  the  primitives  for  moving  between  levels  of 
nesting.  All  other  functions  for  moving  between  levels  of  nesting  can  be  built  out  of  these. 
The  functions  split  and  bottop  are  often  useful  for  divide-and-conquer  routines. 

partitionCv,  counts)  {/ a],[int J  -*  [fa]]  :  a  in  any} 

Given  a  sequence  of  values  and  another  sequence  of  counts,  partition  returns  a  nested 
sequence  with  each  subsequence  being  of  a  length  specified  by  the  counts.  The  sum  of  the 
counts  must  equal  the  length  of  the  sequence  of  values.  For  example: 


V 

counts 


*  [Oo,  ®1,  02*  03,  04,  Os,  06,  07] 

»  C4.  1,  3] 

partition(v,  counts)  »  [[oo,  ax,  a2,  a3] ,  [a4] ,  [a5,  a6,  07]] 

flatten(v)  {[[a]]  — >  [a]  :  a  in  any } 

Given  a  nested  sequence  of  values,  flatten  flattens  the  sequence.  For  example: 

v  »  CCoo,  <*1,  02!.  [03.  aj.  Co5,  a6,  a7]] 

flatten(v)  ■  [oo,  ax,  a2,  a3,  a4,  as,  a6,  a7] 

split  (v,  flags)  {[a], [bool]  -+  [[a]]  :  a  in  any} 

Given  a  sequence  of  values  a  and  a  boolean  sequence  of  flags,  split  creates  a  nested 
sequence  of  length  2  with  all  the  elements  with  an  f  in  their  flag  in  the  first  element  and 
elements  with  a  t  in  their  flag  in  the  second  element.  For  example: 

V  *  [Oo.  Ol»  02,  03,  04,  O5,  06,  a7] 

flags  «  [T,  F,  T,  F,  F,  T,  T,  T] 

split (v,  flags)  «  [[ox,  03,  a4] ,  [00,  o2,  a5,  a6,  a7]] 

bottop(v)  {[a]  — *•  [[a]]  :  a  in  my} 

Given  a  sequence  of  values  values,  bottop  creates  a  nested  sequence  of  length  2  with  all 
the  elements  from  the  bottom  half  of  the  sequence  in  the  first  element  and  elements  from 
the  top  half  of  the  sequence  in  the  second  element.  For  example: 

v  *  [ao ,  oi,  o2 ,  03 ,  a4 ,  as,  06] 

bottop (v)  ■  [[ao,  oi,  o2,  a3] ,  [a4,  a5,  06]] 

head_rest (values)  {[a]  —*  a, [a]  :  a  in  any} 

Given  a  sequence  of  values  values  of  length  >  0,  head-rest  returns  a  pair  containing  the 
first  element  of  the  sequence,  and  the  remaining  elements  of  the  sequence. 

rest-tail  (values)  {[a]  — *  [a],  a  :  a  in  any} 

Given  a  sequence  of  values  values  of  length  >  0,  rest-tail  returns  a  pair  containing  all 
but  the  last  element  of  the  sequence,  and  the  last  element  of  the  sequence. 

Other  Sequence  Functions 

These  are  more  complex  sequence  functions.  The  step  complexities  of  these  functions  are 
not  0(1). 

sort  (a)  {/inf/  -*■  [intj} 
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Sorts  the  input  sequence.  The  sort  is  stable — equal  elements  will  maintain  their  relative 
order. 

rank  (a)  {[int]  -*  [ int] } 

Returns  the  rank  of  each  element  of  the  sequence  a.  The  rank  of  an  element  is  the  position 
it  would  appear  in  if  the  sequence  were  sorted.  A  sort  of  a  sequence  a  can  be  implemented 
as  p«mute(a,  rank(a)).  The  rank  is  stable. 

collect (key_valua_paira )  {[(a>b)]  — *  [(a,M)]  '•  a  in  any;  &  in  any} 

Takes  a  sequence  of  (key,  value)  pairs,  and  collects  each  set  of  values  that  have  the  same 
key  together  into  a  sequence.  The  function  returns  a  sequence  of  (key,  value-sequence) 
pairs.  Each  key  will  only  appear  once  in  the  result  and  the  value-sequence  corresponding 
to  the  key  will  contain  all  the  values  that  had  that  key  in  the  input. 

kth_saallest(s,  k)  {[a], int  -»  a  :  a  in  ordinal } 

Returns  the  kth  smallest  element  of  a  sequence  s  (k  is  0  based).  It  uses  the  quick- select  algo¬ 
rithm  and  therefore  has  expected  work  complexity  of  0(n)  and  an  expected  step  complexity 
of  0(lg  n ). 

saarch-f or-subseqs (subsaq ,  sequence)  {/ a], [a ]  -*  [int]  :  a  in  any } 

Returns  indices  of  all  start  positions  in  sequence  where  the  string  specified  by  subseq 
appears. 

removajduplicatesfs)  {[a]  -*  [a]  :  a  in  any} 

Removes  duplicates  from  a  sequence.  Elements  are  considered  duplicates  if  eql  on  them 
returns  T. 

union(a,  b)  {[a]>[a]  -*  [a]  :  a  in  any} 

Given  two  sequences  each  which  has  no  duplicates,  union  will  return  the  union  of  the 
elements  in  the  sequences. 

intersect  ion  (a,  b)  {[a],  [a]  — ►  [a]  :  a  in  any} 

Given  two  sequences  each  which  has  no  duplicates,  intersection  will  return  the  intersec¬ 
tion  of  the  elements  in  the  sequences. 

name  (a)  {[a]  — ►  [int]  :  a  in  any) 

This  function  assigns  an  integer  label  to  each  unique  value  of  the  sequence  a.  Equal  values 
will  always  be  assigned  the  same  label  and  different  values  will  always  be  assigned  different 
labels.  All  the  labels  will  be  in  the  range  [0 . .  ta)  and  will  correspond  to  the  position  in  a 
of  one  of  the  elements  with  the  same  value.  The  function  remova_duplicates(a)  could  be 
implemented  as  {s  in  a;  i  in  [0:#a];  r  in  name(a)  I  r  **  i}. 
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B.3  Functions  on  Any  Type 


eql(&,  b)  {a, a  — *  bool :  a  in  any} 

Given  two  objects  of  the  same  type,  eql  will  return  t  if  they  are  equal  and  f  otherwise. 
Two  sequences  are  equal  if  they  are  the  same  length  and  their  elements  are  elementwise 
equal.  Two  records  are  equal  if  their  fields  are  equal. 

{a  — *■  [char]  :  a  in  any} 

Given  any  printable  object  v,  0  converts  it  into  its  printable  representation  as  a  character 
string. 

hash(a,  1)  {a,int  —*■  tnt  :  a  in  any} 

Hashes  the  argument  a  and  returns  an  integer  in  the  range  [0.  .1). 

select  (flag,  vl,  v2)  {bool, a, a  — *  a  :  a  in  any } 

Returns  the  second  argument  if  the  flag  is  T  and  the  third  argument  if  the  flag  is  F.  This 
differs  from  an  if  form  in  that  both  arguments  are  evaluated. 

B.4  Functions  with  Side  Effects 

The  functions  in  this  section  are  not  purely  functional.  Unless  otherwise  noted,  none  of 
them  can  be  called  in  parallel — they  cannot  be  called  within  an  apply-to-each  construct. 
The  routines  in  this  section  are  not  part  of  the  core  language,  they  are  meant  for  debugging, 
I/O,  timing  and  display.  Because  these  functions  are  new  it  is  reasonably  likely  that  the 
interface  of  some  of  these  functions  will  change  in  future  versions.  The  user  should  check 
the  most  recent  documentation. 

Input  and  Output  Routines 

Of  the  functions  listed  in  this  section,  only  print.char,  print-string,  printf.char, 
printf  .string,  and  print .qebug  can  be  called  in  parallel. 

print_char (v)  {char  —>  bool} 

Prints  a  character  to  standard  output 

print  Jtring(v)  {[char]  -*  bool} 

Prints  a  character  string  to  standard  output 

print-debug (str,  v)  {[char], a  a  :  a  in  any } 

Prints  the  character  string  str  followed  by  the  string  representation  of  the  object  v,  and 
then  a  newline  to  standard  output.  This  function  can  be  useful  when  debugging. 

write_object_to_f ile(object,  filenaae)  {a, [char]  ->  a  :  a  in  any} 

Writes  an  object  to  a  file.  The  first  argument  is  the  object  and  the  second  argument  is 
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a  filename.  For  example  9rite_object-to.f  ile([2,3,l,0]  , "/tmp/foo")  would  write  a 
vector  of  integers  to  the  file  /tmp/f  oo.  The  data  is  stored  in  an  internal  format  and  can 
only  be  read  back  using  read_object_from_file. 

write_string_to_f ile(a,  filename)  {[char], [char]  -*•  [char]} 

Prints  a  character  string  to  the  file  named  filename. 

read_object_from_file (object-type,  filename)  {a, [char]  -»  a  :  a  in  any) 

Reads  an  object  from  a  file.  The  first  argument  is  an  object  of  the  same  type  as  the  object 
to  be  read,  and  the  second  argument  is  a  filename  from  which  the  object  will  be  read.  For 
example,  the  call  rea<Lobject-from_file(C,"/tmp/foo")  would  read  an  integer  from  the 
file  /tmp/foo,  and  read_object_from_file(0  int,"/tmp/bar")  would  read  a  vector  of 
integers  from  the  file  /tmp/foo.  The  object  needs  to  have  been  stored  using  the  function 
write_ob  j  ect-to-f  ile. 

read_int_seq_from_file(f  ilename)  {[char]  -*  [ int ]} 

Reads  a  sequence  of  integers  from  the  file  named  filename.  The  file  must  start  with  a  left 
parenthesis,  contain  the  integers  separated  by  either  whitespaces,  newlines  or  tabs,  and  end 
with  a  right  parenthesis.  For  example: 

(  22  33  11 
10  14 

12  11  ) 

represents  the  sequence  [22,  33,  11,  10,  14,  12,  11]. 

read-floatjseq_from_file(filename)  {[char]  — ►  [float]} 

Reads  a  sequence  of  floats  from  the  file  named  filename.  The  file  must  start  with  a  left 
parenthesis,  contain  the  floats  separated  by  either  whitespaces,  newlines  or  tabs,  and  end 
with  a  right  parenthesis.  The  file  may  contain  integers  (no  .);  these  will  be  coerced  to  floats. 

open-out-file  (f  ilename)  {[charj  — *•  file.poijiter} 

Opens  a  file  for  writing  and  returns  a  f  ile-pointer  to  that  file.  File  pointers  cannot  be 
returned  to  top-level.  They  must  be  used  within  a  single  top-level  call. 

close-f ile(f ilep)  {file-pointer  — ►  bool} 

Closes  a  file  given  a  file-pointer.  It  returns  T  if  the  close  was  successful  and  F  otherwise. 

printf.charCa,  filep)  {char, file-pointer  —*  bool } 

Prints  a  character  to  the  file  pointed  to  by  the  file  pointer  filep. 

printf  JtringCa,  filep)  {[char], file-pointer  — *  bool} 

Prints  a  character  string  to  the  file  pointed  to  by  the  file  pointer  filep. 
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Plotting  Functions 

The  functions  in  this  section  can  be  used  for  plotting  data  on  an  Xwindow  display.  Currently 
no  color  plotting  is  supported. 

mak«_window(((xO,  yO) ,  width,  height),  bbox,  display) 

{(( int,  int),  int,  int),  boundingbox,  [char]  — ►  window) 

Creates  a  window  on  the  display  specified  by  display.  Its  upper  left  hand  corner  will  be  at 
position  (xO,yO)  on  the  screen  and  will  have  a  size  as  specified  by  width  and  height.  The 
bbox  argument  specifies  the  bounding  box  for  the  data  to  be  plotted  in  the  window.  The 
bounding  box  is  a  structure  that  specifies  the  virtual  coordinates  of  the  window.  It  can 
be  created  with  the  function  bounding-box.  Note  that  windows  get  automatically  closed 
when  you  return  to  top-level.  This  means  that  you  cannot  return  a  window  to  top-level 
and  then  use  it — you  must  create  it  and  use  it  within  a  single  top-level  call. 

bounding-box  (points)  {[(a<b)J  boundingbox  :  a  in  number;  b  in  number) 

Creates  a  bounding-box  to  be  used  by  make-window.  Given  a  sequence  of  points,  this  box 
is  determined  by  the  maximum  and  minimum  x  and  y  values. 

draw-points  (points,  window)  {[(b, a)], window  — ►  int :  a  in  number;  b  in  number) 

Draws  a  sequence  of  points  into  the  window  specified  by  window.  The  window  must  have 
been  created  by  make-window. 

draw-lines  (points,  window)  {[(b,  a)],  window  —»  int :  a  in  number;  6  in  number) 

Draws  a  sequence  of  lines  between  the  points  in  the  points  argument  into  the  window 
specified  by  window. 

draw-segments (segs,  window) 

{[((c,d),a,b)], window  -*  int :  a  in  number;  b  in  number;  c  in  number;  d  in  number) 

Draws  a  sequence  of  line  segments  into  the  window  specified  by  window.  Each  line-segment 
is  specified  as  a  pair  of  points. 

closejwindow(window)  {window  — ►  int) 

Closes  a  window  created  with  make.window.  After  executing  this  command,  the  window 
will  not  accept  any  more  of  the  draw  commands,  and  will  go  away  if  you  mouse  on  it. 

Other  Side  Effecting  Functions 

tim«(a)  {o  — < ►  a, float  :  a  in  any } 

The  expression  TIME(ezp)  returns  a  pair  whose  first  element  is  the  value  of  the  expression 
exp  and  whose  second  element  is  the  time  in  seconds  taken  to  execute  the  expression  exp. 
This  function  cannot  be  called  in  parallel  (within  an  apply-toeach). 


49 


C  Implementation  Notes 

This  section  describes  some  hints  for  writing  efficient  code  in  Nesl.  Most  of  these  hints  are 
t  based  on  the  current  implementation:  tradeoffs  are  likely  to  change  in  future  implementa¬ 

tions.  The  section  also  points  out  some  deficiencies  with  the  current  implementation. 

<  The  Read-Eval-Prinfc  Loop 

Here  we  outline  how  the  interactive  environment  of  Nesl  works.  This  should  give  the  user 
a  feeling  in  some  cases  for  where  time  is  going.  When  the  user  types  an  expression  at 
top-level,  the  following  steps  take  place: 

1.  The  expression  gets  compiled  into  the  intermediate  language  Vcode. 

2.  All  code  in  the  expression’s  call  tree  that  has  not  been  previously  compiled  gets 
compiled  into  Vcode.  When  you  define  a  function  in  Nesl  it  only  gets  partially 
compiled  immediately — the  compilation  completes  the  first  time  it  is  called.  Because 
of  this  delayed  compilation,  calling  a  function  can  take  longer  the  first  time  it  is  used. 

3.  The  Vcode  for  the  expression  and  all  functions  in  its  call  tree  get  written  to  a  file. 
This  file  can  actually  be  inspected  by  the  user,  if  so  desired  (see  the  user’s  manual). 

4.  The  environment  starts  up  a  subprocess  that  executes  the  Vcode  interpreter  on  the 
Vcode  file.  The  Vcode  interpreter  is  a  stand-alone  executable  program. 

5.  When  the  interpreter  completes  the  computation,  it  writes  the  output  to  a  new  file 
and  exits. 

6.  When  the  interpreter  has  finished  writing  the  output,  the  Nesl  environment  reads 
the  output  file,  interprets  the  data  and  prints  the  result. 

When  executing  on  a  remote  machine,  the  only  step  that  differs  is  Step  4.  Instead  of 
executing  the  interpreter  locally,  the  environment  executes  the  appropriate  version  of  the 
interpreter  remotely  (using  rsh).  If  the  remote  machine  is  on  a  shared  file  system,  such  as 
AFS,  then  no  files  need  to  be  explicitly  copied.  If  it  is  not  on  a  shared  file  system,  then 
the  VCODE  file  gets  copied  by  the  system  to  the  remote  machine  before  execution  and  the 
^  results  get  copied  back  when  the  interpreter  completes. 

Using  Large  Data  Sets 

In  the  current  implementation  of  Nesl  users  need  to  be  somewhat  careful  about  returning 
large  data-sets  to  the  top-level  interpreter,  or  of  passing  large  data  sets  in  as  an  argument  at 
top-level  (we  consider  a  data-set  large  if  it  contains  more  that  10,000  or  so  elements).  Such 
passing  can  be  quite  slow  since  it  require  writing  the  data  out  to  a  file  and  then  reading 
it  back  in.  To  avoid  this  bottleneck,  we  suggest  using  one  of  the  NESL  I/O  functions 
to  read  and  write  the  data  (e.g.  read-object _from_file,  read_int_seq_from_f ile  and 
write_0bject-to_file). 
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For  example,  if  a  user  had  an  application  solve  that  required  a  large  sequence  of  pairs 
as  input,  and  returned  another  large  sequence  of  pairs  as  output,  the  best  way  to  write  this 
would  be: 

function  solv«_froa_f ile(inf ilename.outfilename) 
let  in-data  •  read_object_from_f ile(  □  (int.int)  .infilename) 
result  ■  solve (in-data) ; 

tap  ■  vrite_obj ect_to_fila  (result  ,outf ilename) 
in  take (result, 100) 

Note  that  solve-froa_f  ile  function  only  returns  the  first  100  elements  of  the  result.  This 
makes  it  possible  to  make  sure  the  result  looks  reasonable  without  returning  the  whole 
thing,  which  would  be  slow.  Instead  the  whole  result  gets  written  to  a  file. 

The  Truth  about  Complexity 

Equations  1  and  2  in  Section  1.5  specified  how  the  work  and  step  complexities  could  be 
combined  in  an  apply-to-each.  In  the  current  implementation  there  are  a  couple  caveats 
to  this,  which  the  user  should  be  aware  of.  The  first  concerns  work  complexity.  In  *he 
following  discussion  we  will  consider  a  variable  constant  with  regards  to  an  apply-to-each 
if  the  variable  is  free  (not  bound)  in  the  body  of  the  apply-to-each  and  is  not  defined  in 
bindings  of  the  apply-to-each.  For  example,  in 

(foo(a,b*c) :  b  in  a} 

the  variables  a  and  c  are  free  with  regards  to  the  apply-to-each,  while  b  is  not.  We  will 
refer  to  these  variables  as  free-vars.  In  the  current  implementation  all  free-vars  need  to 
be  copied  across  the  instances  of  an  apply-to-each.  This  copying  requires  time,  and  the 
equation  for  combining  work  complexity  that  includes  this  cost  is: 


W({el(a)  :  a  in  «2(b)})  =  W(e2(b))  +  sum({IF(el(a))  :  a  in  e2(b)}) 

■f  ^  {Length{a2{\i))  x  5i,ze(c)') 

cefret-vars 

where  the  last  term  is  has  been  added  to  Equation  1  (Length(a 2(b))  refers  to  the  length 
of  the  sequence  returned  by  e2(b),  and  Size(c)  refers  to  the  size  of  each  free-var).  If  a 
free-var  is  large,  this  copy  could  be  the  dominant  cost  of  an  apply-to-each.  Here  are  some 
examples  of  such  cases: 

Expression  Work  Complexity 

(#a  ♦  i  :  i  in  a)  (#a)2 

{a[i]  :  i  in  b)  fa  x  fb 

In  both  cases  the  work  is  a  factor  of  fa  greater  than  we  might  expect  since  the  sequence  a 
needs  to  be  copied  over  the  instances.  As  well  as  requiring  extra  work  these  copies  require 
significant  extra  memory  and  can  be  a  memory  bottleneck  in  a  program.  Both  the  above 
examples  can  easily  be  rewritten  to  reduce  the  work  and  memory: 
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Expression  Work  Complexity 

let  b  »  ta 

in  {b  +  i  :  i  in  »}  # a 

a->b  fb 

The  user  should  be  conscious  of  these  costs  and  rewrite  such  expressions. 

A  second  problem  with  the  current  implementation  is  that  Equation  2  (the  combining 
rule  for  step  complexity)  only  holds  if  the  body  of  the  apply-to-each  is  contained.  The 
definition  of  contained  code  is  code  where  only  one  branch  of  a  conditional  has  a  non 
constant  step  complexity.  For  example,  the  following  function  is  not  contained  because 
both  branches  of  the  inner  if  have  S(n)  >  0(1): 

function  poser (a,  n)  ■ 
if  (n  mm  0)  than  1 

else 

if  evenp(n) 

then  square(poser(a,  n/2)) 
else  a  *  square (poser (a,  n/2)) 

This  can  be  fixed  by  calculating  poser  (a,  n/2)  outside  the  conditional: 

function  poserCa,  n)  * 
if  (n  •«  0)  then  1 

else 

let  pos  ■  power (a,  n/2) 
in  if  evenp(n) 

then  square (pos) 
else  a  *  square (pos) 


In  future  implementations  of  Nesl  it  is  likely  that  this  restriction  will  be  removed. 
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This  report  describes  NESL,  a  strongly-typed,  applicative,  data-parallel  language.  Nesl  is  intended  to  be 
used  as  a  portable  interface  for  programming  a  variety  of  parallel  and  vector  supercomputers,  and  as  a 
basis  for  teaching  parallel  algorithms.  Parallelism  is  supplied  through  a  simple  set  of  data-parallel 
constructs  based  on  sequences  (ordered  sets),  including  a  mechanism  for  applying  any  function  over  the 
elements  of  a  sequence  in  parallel  and  a  rich  set  of  parallel  functions  that  manipulate  sequences. 

Nesl  fully  supports  nested  sequences  and  nested  parallelism  —  the  ability  to  take  a  parallel  function  and 
apply  it  over  multiple  instances  in  parallel.  Nested  parallelism  is  important  for  implementing  algorithms 
with  complex  and  dynamically  changing  data  structures,  such  as  required  in  many  graph  and  sparse 
matrix  algorithms.  NESL  also  provides  a  mechanism  for  calculating  the  asymptotic  running  time  for  a 
program  on  various  parallel  machine  models,  including  the  parallel  random  access  machine  (PRAM). 
This  is  useful  for  estimating  running  times  of  algorithms  on  actual  machines  and,  when  teaching 
algorithms,  for  supplying  a  close  correspondence  between  the  code  and  the  theoretical  complexity. 

This  report  defines  Nesl  and  describes  several  examples  of  algorithms  coded  in  the  language.  The 
examples  include  algorithms  for  median  finding,  sorting,  string  searching,  finding  prime  numbers,  and 
finding  a  planar  convex  hull.  NESL  currently  compiles  to  an  intermediate  language  called  Vcode,  which 
runs  on  the  Cray  Y-MP,  Connection  Machine  CM-2,  and  Encore  Multimax.  For  many  algorithms,  the 
current  implementation  gives  performance  close  to  optimized  machine-specific  code  for  these  machines. 


Note:  This  report  is  an  updated  version  of  CMU-CS-92-103,  which  described  version  2.4  of  the  language. 
The  most  significant  changes  in  version  2.6  are  that  it  supports  polymorphic  types,  has  an  ML-like  syntax 
instead  of  a  lisp-like  syntax,  and  includes  support  for  I/O. 
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t ration  of  ts  programs  on  the  oasis  of  m  gor  cn-v , * 
orientation  or  in  violation  of  federa  state  or  Oi.  \  -iw 

Inquiries  concerning  app!'Cat.on  of  these  state-r  <  r  ’  . 
Mellon  University  5000  forties  Aver '...e  f-M-sn  jfgr  p.* 
President  for  Enrollment.  Carnegie  Motion  ijn-versity 
telephone  (412)  268-2056 


